Midas: Microcluster-Based Detector of Anomalies in Edge Streams

Siddharth Bhatia; Bryan Hooi; Minji Yoon; Kijung Shin; Christos Faloutsos

doi:10.1609/aaai.v34i04.5724

Real-Time Anomaly Detection in Edge Streams

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3494564 ◽

2022 ◽

Vol 16 (4) ◽

pp. 1-22

Author(s):

Siddharth Bhatia ◽

Rui Liu ◽

Bryan Hooi ◽

Minji Yoon ◽

Kijung Shin ◽

...

Keyword(s):

State Of The Art ◽

Characteristic Curve ◽

Denial Of Service ◽

Scoring Function ◽

Threshold Value ◽

Constant Time ◽

Dynamic Graph ◽

Unusual Behavior ◽

Poisoning Effect ◽

Anomaly Score

Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose Midas , which focuses on detecting microcluster anomalies , or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. We further propose Midas -F, to solve the problem by which anomalies are incorporated into the algorithm’s internal states, creating a “poisoning” effect that can allow future anomalies to slip through undetected. Midas -F introduces two modifications: (1) we modify the anomaly scoring function, aiming to reduce the “poisoning” effect of newly arriving edges; (2) we introduce a conditional merge step, which updates the algorithm’s data structures after each time tick, but only if the anomaly score is below a threshold value, also to reduce the “poisoning” effect. Experiments show that Midas -F has significantly higher accuracy than Midas . In general, the algorithms proposed in this work have the following properties: (a) they detects microcluster anomalies while providing theoretical guarantees about the false positive probability; (b) they are online, thus processing each edge in constant time and constant memory, and also processes the data orders-of-magnitude faster than state-of-the-art approaches; and (c) they provides up to 62% higher area under the receiver operating characteristic curve than state-of-the-art approaches.

Download Full-text

Image Watermarking Scheme for Specifying False Positive Probability and Bit-pattern Embedding

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.132.932 ◽

2012 ◽

Vol 132 (6) ◽

pp. 932-939

Author(s):

Kohei Sayama ◽

Masayoshi Nakamoto ◽

Mitsuji Muneyasu ◽

Shuichi Ohno

Keyword(s):

False Positive ◽

Image Watermarking ◽

Positive Probability ◽

Watermarking Scheme ◽

False Positive Probability

Download Full-text

True and False Positive Probability Estimation of Magnitude Crosscorrelation Coefficient by Using the Rayleigh Distribution

NeuroImage ◽

10.1016/s1053-8119(18)31580-5 ◽

1998 ◽

Vol 7 (4) ◽

pp. S747

Author(s):

K. Sekihara ◽

H. Kikyo ◽

K. Nakajima ◽

H. Koizumi ◽

Y. Miyashita

Keyword(s):

False Positive ◽

Rayleigh Distribution ◽

Probability Estimation ◽

Positive Probability ◽

False Positive Probability

Download Full-text

False-Positive Probability and Compression Optimization for Tree-Structured Bloom Filters

ACM Transactions on Modeling and Performance Evaluation of Computing Systems ◽

10.1145/2940324 ◽

2016 ◽

Vol 1 (4) ◽

pp. 1-39

Author(s):

Yongquan Fu ◽

Ernst Biersack

Keyword(s):

False Positive ◽

Bloom Filters ◽

Positive Probability ◽

False Positive Probability

Download Full-text

Accurate Counting Bloom Filters for Large-Scale Data Processing

Mathematical Problems in Engineering ◽

10.1155/2013/516298 ◽

2013 ◽

Vol 2013 ◽

pp. 1-11 ◽

Cited By ~ 6

Author(s):

Wei Li ◽

Kun Huang ◽

Dafang Zhang ◽

Zheng Qin

Keyword(s):

Data Processing ◽

False Positive ◽

Large Scale ◽

Bloom Filters ◽

Positive Probability ◽

Large Scale Data ◽

Large Scale Data Processing ◽

False Positive Probability ◽

Scale Data ◽

Counting Bloom Filters

Bloom filters are space-efficient randomized data structures for fast membership queries, allowing false positives. Counting Bloom Filters (CBFs) perform the same operations on dynamic sets that can be updated via insertions and deletions. CBFs have been extensively used in MapReduce to accelerate large-scale data processing on large clusters by reducing the volume of datasets. The false positive probability of CBF should be made as low as possible for filtering out more redundant datasets. In this paper, we propose a multilevel optimization approach to building an Accurate Counting Bloom Filter (ACBF) for reducing the false positive probability. ACBF is constructed by partitioning the counter vector into multiple levels. We propose an optimized ACBF by maximizing the first level size, in order to minimize the false positive probability while maintaining the same functionality as CBF. Simulation results show that the optimized ACBF reduces the false positive probability by up to 98.4% at the same memory consumption compared to CBF. We also implement ACBFs in MapReduce to speed up the reduce-side join. Experiments on realistic datasets show that ACBF reduces the false positive probability by 72.3% as well as the map outputs by 33.9% and improves the join execution times by 20% compared to CBF.

Download Full-text

Model fusion of deep neural networks for anomaly detection

Journal Of Big Data ◽

10.1186/s40537-021-00496-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Nouar AlDahoul ◽

Hezerul Abdul Karim ◽

Abdulaziz Saleh Ba Wazir

Keyword(s):

Neural Networks ◽

Anomaly Detection ◽

Network Traffic ◽

Large Scale ◽

Deep Neural Networks ◽

Denial Of Service ◽

Traffic Data ◽

Model Fusion ◽

Class Weight ◽

Network Anomaly Detection

AbstractNetwork Anomaly Detection is still an open challenging task that aims to detect anomalous network traffic for security purposes. Usually, the network traffic data are large-scale and imbalanced. Additionally, they have noisy labels. This paper addresses the previous challenges and utilizes million-scale and highly imbalanced ZYELL’s dataset. We propose to train deep neural networks with class weight optimization to learn complex patterns from rare anomalies observed from the traffic data. This paper proposes a novel model fusion that combines two deep neural networks including binary normal/attack classifier and multi-attacks classifier. The proposed solution can detect various network attacks such as Distributed Denial of Service (DDOS), IP probing, PORT probing, and Network Mapper (NMAP) probing. The experiments conducted on a ZYELL’s real-world dataset show promising performance. It was found that the proposed approach outperformed the baseline model in terms of average macro Fβ score and false alarm rate by 17% and 5.3%, respectively.

Download Full-text

Deep Metric Learning with False Positive Probability

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-70090-8_66 ◽

2017 ◽

pp. 653-664 ◽

Cited By ~ 2

Author(s):

Jia-Xing Zhong ◽

Ge Li ◽

Nannan Li

Keyword(s):

False Positive ◽

Metric Learning ◽

Positive Probability ◽

Deep Metric Learning ◽

False Positive Probability

Download Full-text

Tree-structured Bloom Filters for Joint Optimization of False Positive Probability and Transmission Bandwidth

Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems - SIGMETRICS '15 ◽

10.1145/2745844.2745881 ◽

2015 ◽

Author(s):

Yongquan Fu ◽

Ernst Biersack

Keyword(s):

False Positive ◽

Bloom Filters ◽

Joint Optimization ◽

Positive Probability ◽

Transmission Bandwidth ◽

False Positive Probability

Download Full-text

Leveraging Image Representation of Network Traffic Data and Transfer Learning in Botnet Detection

Big Data and Cognitive Computing ◽

10.3390/bdcc2040037 ◽

2018 ◽

Vol 2 (4) ◽

pp. 37 ◽

Cited By ~ 2

Author(s):

Shayan Taheri ◽

Milad Salem ◽

Jiann-Shiun Yuan

Keyword(s):

Deep Learning ◽

Transfer Learning ◽

Network Traffic ◽

Denial Of Service ◽

Security And Privacy ◽

Support Vector ◽

The Internet ◽

Traffic Data ◽

Botnet Detection ◽

Detection Techniques

The advancements in the Internet has enabled connecting more devices into this technology every day. The emergence of the Internet of Things has aggregated this growth. Lack of security in an IoT world makes these devices hot targets for cyber criminals to perform their malicious actions. One of these actions is the Botnet attack, which is one of the main destructive threats that has been evolving since 2003 into different forms. This attack is a serious threat to the security and privacy of information. Its scalability, structure, strength, and strategy are also under successive development, and that it has survived for decades. A bot is defined as a software application that executes a number of automated tasks (simple but structurally repetitive) over the Internet. Several bots make a botnet that infects a number of devices and communicates with their controller called the botmaster to get their instructions. A botnet executes tasks with a rate that would be impossible to be done by a human being. Nowadays, the activities of bots are concealed in between the normal web flows and occupy more than half of all web traffic. The largest use of bots is in web spidering (web crawler), in which an automated script fetches, analyzes, and files information from web servers. They also contribute to other attacks, such as distributed denial of service (DDoS), SPAM, identity theft, phishing, and espionage. A number of botnet detection techniques have been proposed, such as honeynet-based and Intrusion Detection System (IDS)-based. These techniques are not effective anymore due to the constant update of the bots and their evasion mechanisms. Recently, botnet detection techniques based upon machine/deep learning have been proposed that are more capable in comparison to their previously mentioned counterparts. In this work, we propose a deep learning-based engine for botnet detection to be utilized in the IoT and the wearable devices. In this system, the normal and botnet network traffic data are transformed into image before being given into a deep convolutional neural network, named DenseNet with and without considering transfer learning. The system is implemented using Python programming language and the CTU-13 Dataset is used for evaluation in one study. According to our simulation results, using transfer learning can improve the accuracy from 33.41% up to 99.98%. In addition, two other classifiers of Support Vector Machine (SVM) and logistic regression have been used. They showed an accuracy of 83.15% and 78.56%, respectively. In another study, we evaluate our system by an in-house live normal dataset and a solely botnet dataset. Similarly, the system performed very well in data classification in these studies. To examine the capability of our system for real-time applications, we measure the system training and testing times. According to our examination, it takes 0.004868 milliseconds to process each packet from the network traffic data during testing.

Download Full-text

Attack on Cocktail Watermarking Based on High False Positive Probability

Second Workshop on Digital Media and its Application in Museum & Heritages (DMAMH 2007) ◽

10.1109/dmamh.2007.66 ◽

2007 ◽

Author(s):

Jie Jiang ◽

Dr Chen ◽

Dan Zhang ◽

Jianjun Guan

Keyword(s):

False Positive ◽

Positive Probability ◽

False Positive Probability

Download Full-text