A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pig and Typical Mapreduce

The requirement for having a labeled response variable in training data from the supervised learning technique may not be satisfied in some situations: particularly, in dynamic, short-term, and ad-hoc wireless network access environments. Being able to conduct classification without a labeled response variable is an essential challenge to modern network security and intrusion detection. In this chapter we will discuss some unsupervised learning techniques including probability, similarity, and multidimensional models that can be applied in network security. These methods also provide a different angle to analyze network traffic data. For comprehensive knowledge on unsupervised learning techniques please refer to the machine learning references listed in the previous chapter; for their applications in network security see Carmines, Edward & McIver (1981), Lane & Brodley (1997), Herrero, Corchado, Gastaldo, Leoncini, Picasso & Zunino (2007), and Dhanalakshmi & Babu (2008). Unlike in supervised learning, where for each vector 1 2 ( , , , ) n X x x x = ? we have a corresponding observed response, Y, in unsupervised learning we only have X, and Y is not available either because we could not observe it or its frequency is too low to be fit ted with a supervised learning approach. Unsupervised learning has great meanings in practice because in many circumstances, available network traffic data may not include any anomalous events or known anomalous events (e.g., traffics collected from a newly constructed network system). While high-speed mobile wireless and ad-hoc network systems have become popular, the importance and need to develop new unsupervised learning methods that allow the modeling of network traffic data to use anomaly-free training data have significantly increased.

Download Full-text

MD-MinerP: Interaction Profiling Bipartite Graph Mining for Malware-Control Domain Detection

Security and Communication Networks ◽

10.1155/2020/8841544 ◽

2020 ◽

Vol 2020 ◽

pp. 1-20

Author(s):

Tzung-Han Jeng ◽

Yi-Ming Chen ◽

Chien-Chih Chen ◽

Chuan-Chiang Huang

Keyword(s):

Network Traffic ◽

Graph Mining ◽

Single Point ◽

Bipartite Graphs ◽

Traffic Data ◽

Domain Names ◽

Advanced Persistent Threat ◽

Threat Intelligence ◽

Extraction Stage

Despite the efforts of information security experts, cybercrimes are still emerging at an alarming rate. Among the tools used by cybercriminals, malicious domains are indispensable and harm from the Internet has become a global problem. Malicious domains play an important role from SPAM and Cross-Site Scripting (XSS) threats to Botnet and Advanced Persistent Threat (APT) attacks at large scales. To ensure there is not a single point of failure or to prevent their detection and blocking, malware authors have employed domain generation algorithms (DGAs) and domain-flux techniques to generate a large number of domain names for malicious servers. As a result, malicious servers are difficult to detect and remove. Furthermore, the clues of cybercrime are stored in network traffic logs, but analyzing long-term big network traffic data is a challenge. To adapt the technology of cybercrimes and automatically detect unknown malicious threats, we previously proposed a system called MD-Miner. To improve its efficiency and accuracy, we propose the MD-MinerP here, which generates more features with identification capabilities in the feature extraction stage. Moreover, MD-MinerP adapts interaction profiling bipartite graphs instead of annotated bipartite graphs. The experimental results show that MD-MinerP has better area under curve (AUC) results and found new malicious domains that could not be recognized by other threat intelligence systems. The MD-MinerP exhibits both scalability and applicability, which has been experimentally validated on actual enterprise network traffic.

Download Full-text