scholarly journals Internet Traffic Classification with Federated Learning

Electronics ◽  
2020 ◽  
Vol 10 (1) ◽  
pp. 27
Author(s):  
Hyunsu Mun ◽  
Youngseok Lee

As Internet traffic classification is a typical problem for ISPs or mobile carriers, there have been a lot of studies based on statistical packet header information, deep packet inspection, or machine learning. Due to recent advances in end-to-end encryption and dynamic port policies, machine or deep learning has been an essential key to improve the accuracy of packet classification. In addition, ISPs or mobile carriers should carefully deal with the privacy issue while collecting user packets for accounting or security. The recent development of distributed machine learning, called federated learning, collaboratively carries out machine learning jobs on the clients without uploading data to a central server. Although federated learning provides an on-device learning framework towards user privacy protection, its feasibility and performance of Internet traffic classification have not been fully examined. In this paper, we propose a federated-learning traffic classification protocol (FLIC), which can achieve an accuracy comparable to centralized deep learning for Internet application identification without privacy leakage. FLIC can classify new applications on-the-fly when a participant joins in learning with a new application, which has not been done in previous works. By implementing the prototype of FLIC clients and a server with TensorFlow, the clients gather packets, perform the on-device training job and exchange the training results with the FLIC server. In addition, we demonstrate that federated learning-based packet classification achieves an accuracy of 88% under non-independent and identically distributed (non-IID) traffic across clients. When a new application that can be classified dynamically as a client participates in learning was added, an accuracy of 92% was achieved.

Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1376
Author(s):  
Yung-Fa Huang ◽  
Chuan-Bi Lin ◽  
Chien-Min Chung ◽  
Ching-Mu Chen

In recent years, privacy awareness is concerned due to many Internet services have chosen to use encrypted agreements. In order to improve the quality of service (QoS), the network encrypted traffic behaviors are classified based on machine learning discussed in this paper. However, the traditional traffic classification methods, such as IP/ASN (Autonomous System Number) analysis, Port-based and deep packet inspection, etc., can classify traffic behavior, but cannot effectively handle encrypted traffic. Thus, this paper proposed a hybrid traffic classification (HTC) method based on machine learning and combined with IP/ASN analysis with deep packet inspection. Moreover, the majority voting method was also used to quickly classify different QoS traffic accurately. Experimental results show that the proposed HTC method can effectively classify different encrypted traffic. The classification accuracy can be further improved by 10% with majority voting as K = 13. Especially when the networking data are using the same protocol, the proposed HTC can effectively classify the traffic data with different behaviors with the differentiated services code point (DSCP) mark.


2014 ◽  
Vol 602-605 ◽  
pp. 1933-1937
Author(s):  
Lian Fa Wu

In recent years, Internet traffic classification using machine learning is a hot topic, and supervised learning methods which contain Support Vector Machine were used to identify Internet traffic in many papers. The supervised learning methods need many instances which have been labeled to train classifying model, but it is difficult to label the instances because many traffic have been encrypted. Labeled instances and unlabeled instances can be used by semi-supervised learning methods to train the classifying model, so that it is very fit for p2p traffic identification. Transductive support vector machine is one of the typical semi-supervised learning methods. Based on theoretic analyzing and experiment, we compared the accuracy of TSVM and SVM. The experiment results show that the semi-supervised methods have some advantages on identification of p2p traffic.


Sign in / Sign up

Export Citation Format

Share Document