scholarly journals Midas: Microcluster-Based Detector of Anomalies in Edge Streams

2020 ◽  
Vol 34 (04) ◽  
pp. 3242-3249 ◽  
Author(s):  
Siddharth Bhatia ◽  
Bryan Hooi ◽  
Minji Yoon ◽  
Kijung Shin ◽  
Christos Faloutsos

Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose Midas, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. Midas has the following properties: (a) it detects microcluster anomalies while providing theoretical guarantees about its false positive probability; (b) it is online, thus processing each edge in constant time and constant memory, and also processes the data 108–505 times faster than state-of-the-art approaches; (c) it provides 46%-52% higher accuracy (in terms of AUC) than state-of-the-art approaches.

2022 ◽  
Vol 16 (4) ◽  
pp. 1-22
Author(s):  
Siddharth Bhatia ◽  
Rui Liu ◽  
Bryan Hooi ◽  
Minji Yoon ◽  
Kijung Shin ◽  
...  

Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose Midas , which focuses on detecting microcluster anomalies , or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. We further propose Midas -F, to solve the problem by which anomalies are incorporated into the algorithm’s internal states, creating a “poisoning” effect that can allow future anomalies to slip through undetected. Midas -F introduces two modifications: (1) we modify the anomaly scoring function, aiming to reduce the “poisoning” effect of newly arriving edges; (2) we introduce a conditional merge step, which updates the algorithm’s data structures after each time tick, but only if the anomaly score is below a threshold value, also to reduce the “poisoning” effect. Experiments show that Midas -F has significantly higher accuracy than Midas . In general, the algorithms proposed in this work have the following properties: (a) they detects microcluster anomalies while providing theoretical guarantees about the false positive probability; (b) they are online, thus processing each edge in constant time and constant memory, and also processes the data orders-of-magnitude faster than state-of-the-art approaches; and (c) they provides up to 62% higher area under the receiver operating characteristic curve than state-of-the-art approaches.


2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
Wei Li ◽  
Kun Huang ◽  
Dafang Zhang ◽  
Zheng Qin

Bloom filters are space-efficient randomized data structures for fast membership queries, allowing false positives. Counting Bloom Filters (CBFs) perform the same operations on dynamic sets that can be updated via insertions and deletions. CBFs have been extensively used in MapReduce to accelerate large-scale data processing on large clusters by reducing the volume of datasets. The false positive probability of CBF should be made as low as possible for filtering out more redundant datasets. In this paper, we propose a multilevel optimization approach to building an Accurate Counting Bloom Filter (ACBF) for reducing the false positive probability. ACBF is constructed by partitioning the counter vector into multiple levels. We propose an optimized ACBF by maximizing the first level size, in order to minimize the false positive probability while maintaining the same functionality as CBF. Simulation results show that the optimized ACBF reduces the false positive probability by up to 98.4% at the same memory consumption compared to CBF. We also implement ACBFs in MapReduce to speed up the reduce-side join. Experiments on realistic datasets show that ACBF reduces the false positive probability by 72.3% as well as the map outputs by 33.9% and improves the join execution times by 20% compared to CBF.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Nouar AlDahoul ◽  
Hezerul Abdul Karim ◽  
Abdulaziz Saleh Ba Wazir

AbstractNetwork Anomaly Detection is still an open challenging task that aims to detect anomalous network traffic for security purposes. Usually, the network traffic data are large-scale and imbalanced. Additionally, they have noisy labels. This paper addresses the previous challenges and utilizes million-scale and highly imbalanced ZYELL’s dataset. We propose to train deep neural networks with class weight optimization to learn complex patterns from rare anomalies observed from the traffic data. This paper proposes a novel model fusion that combines two deep neural networks including binary normal/attack classifier and multi-attacks classifier. The proposed solution can detect various network attacks such as Distributed Denial of Service (DDOS), IP probing, PORT probing, and Network Mapper (NMAP) probing. The experiments conducted on a ZYELL’s real-world dataset show promising performance. It was found that the proposed approach outperformed the baseline model in terms of average macro Fβ score and false alarm rate by 17% and 5.3%, respectively.


2018 ◽  
Vol 2 (4) ◽  
pp. 37 ◽  
Author(s):  
Shayan Taheri ◽  
Milad Salem ◽  
Jiann-Shiun Yuan

The advancements in the Internet has enabled connecting more devices into this technology every day. The emergence of the Internet of Things has aggregated this growth. Lack of security in an IoT world makes these devices hot targets for cyber criminals to perform their malicious actions. One of these actions is the Botnet attack, which is one of the main destructive threats that has been evolving since 2003 into different forms. This attack is a serious threat to the security and privacy of information. Its scalability, structure, strength, and strategy are also under successive development, and that it has survived for decades. A bot is defined as a software application that executes a number of automated tasks (simple but structurally repetitive) over the Internet. Several bots make a botnet that infects a number of devices and communicates with their controller called the botmaster to get their instructions. A botnet executes tasks with a rate that would be impossible to be done by a human being. Nowadays, the activities of bots are concealed in between the normal web flows and occupy more than half of all web traffic. The largest use of bots is in web spidering (web crawler), in which an automated script fetches, analyzes, and files information from web servers. They also contribute to other attacks, such as distributed denial of service (DDoS), SPAM, identity theft, phishing, and espionage. A number of botnet detection techniques have been proposed, such as honeynet-based and Intrusion Detection System (IDS)-based. These techniques are not effective anymore due to the constant update of the bots and their evasion mechanisms. Recently, botnet detection techniques based upon machine/deep learning have been proposed that are more capable in comparison to their previously mentioned counterparts. In this work, we propose a deep learning-based engine for botnet detection to be utilized in the IoT and the wearable devices. In this system, the normal and botnet network traffic data are transformed into image before being given into a deep convolutional neural network, named DenseNet with and without considering transfer learning. The system is implemented using Python programming language and the CTU-13 Dataset is used for evaluation in one study. According to our simulation results, using transfer learning can improve the accuracy from 33.41% up to 99.98%. In addition, two other classifiers of Support Vector Machine (SVM) and logistic regression have been used. They showed an accuracy of 83.15% and 78.56%, respectively. In another study, we evaluate our system by an in-house live normal dataset and a solely botnet dataset. Similarly, the system performed very well in data classification in these studies. To examine the capability of our system for real-time applications, we measure the system training and testing times. According to our examination, it takes 0.004868 milliseconds to process each packet from the network traffic data during testing.


Sign in / Sign up

Export Citation Format

Share Document