Real-Time Anomaly Detection in Edge Streams

Siddharth Bhatia; Rui Liu; Bryan Hooi; Minji Yoon; Kijung Shin; Christos Faloutsos

doi:10.1145/3494564

Real-Time Anomaly Detection in Edge Streams

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3494564 ◽

2022 ◽

Vol 16 (4) ◽

pp. 1-22

Author(s):

Siddharth Bhatia ◽

Rui Liu ◽

Bryan Hooi ◽

Minji Yoon ◽

Kijung Shin ◽

...

Keyword(s):

State Of The Art ◽

Characteristic Curve ◽

Denial Of Service ◽

Scoring Function ◽

Threshold Value ◽

Constant Time ◽

Dynamic Graph ◽

Unusual Behavior ◽

Poisoning Effect ◽

Anomaly Score

Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose Midas , which focuses on detecting microcluster anomalies , or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. We further propose Midas -F, to solve the problem by which anomalies are incorporated into the algorithm’s internal states, creating a “poisoning” effect that can allow future anomalies to slip through undetected. Midas -F introduces two modifications: (1) we modify the anomaly scoring function, aiming to reduce the “poisoning” effect of newly arriving edges; (2) we introduce a conditional merge step, which updates the algorithm’s data structures after each time tick, but only if the anomaly score is below a threshold value, also to reduce the “poisoning” effect. Experiments show that Midas -F has significantly higher accuracy than Midas . In general, the algorithms proposed in this work have the following properties: (a) they detects microcluster anomalies while providing theoretical guarantees about the false positive probability; (b) they are online, thus processing each edge in constant time and constant memory, and also processes the data orders-of-magnitude faster than state-of-the-art approaches; and (c) they provides up to 62% higher area under the receiver operating characteristic curve than state-of-the-art approaches.

Download Full-text

Midas: Microcluster-Based Detector of Anomalies in Edge Streams

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5724 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3242-3249 ◽

Cited By ~ 3

Author(s):

Siddharth Bhatia ◽

Bryan Hooi ◽

Minji Yoon ◽

Kijung Shin ◽

Christos Faloutsos

Keyword(s):

Network Traffic ◽

False Positive ◽

State Of The Art ◽

Denial Of Service ◽

Constant Time ◽

Traffic Data ◽

Positive Probability ◽

Dynamic Graph ◽

Unusual Behavior ◽

False Positive Probability

Download Full-text

The use of plasma donor-derived, cell-free DNA to monitor acute rejection after kidney transplantation

Nephrology Dialysis Transplantation ◽

10.1093/ndt/gfz091 ◽

2019 ◽

Vol 35 (4) ◽

pp. 714-721 ◽

Cited By ~ 10

Author(s):

Els M Gielis ◽

Kristien J Ledeganck ◽

Amélie Dendooven ◽

Pieter Meysman ◽

Charlie Beirnaert ◽

...

Keyword(s):

Acute Rejection ◽

Serum Creatinine ◽

Characteristic Curve ◽

Area Under The Curve ◽

Threshold Value ◽

Nucleotide Polymorphisms ◽

Kidney Transplant Recipients ◽

Cell Free Dna ◽

Free Dna ◽

Set Up

Abstract Background After transplantation, cell-free deoxyribonucleic acid (DNA) derived from the donor organ (ddcfDNA) can be detected in the recipient’s circulation. We aimed to investigate the role of plasma ddcfDNA as biomarker for acute kidney rejection. Methods From 107 kidney transplant recipients, plasma samples were collected longitudinally after transplantation (Day 1 to 3 months) within a multicentre set-up. Cell-free DNA from the donor was quantified in plasma as a fraction of the total cell-free DNA by next generation sequencing using a targeted, multiplex polymerase chain reaction-based method for the analysis of single nucleotide polymorphisms. Results Increases of the ddcfDNA% above a threshold value of 0.88% were significantly associated with the occurrence of episodes of acute rejection (P = 0.017), acute tubular necrosis (P = 0.011) and acute pyelonephritis (P = 0.032). A receiver operating characteristic curve analysis revealed an equal area under the curve of the ddcfDNA% and serum creatinine of 0.64 for the diagnosis of acute rejection. Conclusions Although increases in plasma ddcfDNA% are associated with graft injury, plasma ddcfDNA does not outperform the diagnostic capacity of the serum creatinine in the diagnosis of acute rejection.

Download Full-text

Can Computers Outperform Humans in Detecting User Zone-Outs? Implications for Intelligent Interfaces

ACM Transactions on Computer-Human Interaction ◽

10.1145/3481889 ◽

2022 ◽

Vol 29 (2) ◽

pp. 1-33

Author(s):

Nigel Bosch ◽

Sidney K. D'Mello

Keyword(s):

Operating Characteristic ◽

State Of The Art ◽

Characteristic Curve ◽

Ground Truth ◽

Mind Wandering ◽

High Stakes ◽

Machine Learning Model ◽

Improve Accuracy ◽

Comparable Accuracy ◽

Operating Characteristic Curve

The ability to identify whether a user is “zoning out” (mind wandering) from video has many HCI (e.g., distance learning, high-stakes vigilance tasks). However, it remains unknown how well humans can perform this task, how they compare to automatic computerized approaches, and how a fusion of the two might improve accuracy. We analyzed videos of users’ faces and upper bodies recorded 10s prior to self-reported mind wandering (i.e., ground truth) while they engaged in a computerized reading task. We found that a state-of-the-art machine learning model had comparable accuracy to aggregated judgments of nine untrained human observers (area under receiver operating characteristic curve [AUC] = .598 versus .589). A fusion of the two (AUC = .644) outperformed each, presumably because each focused on complementary cues. Furthermore, adding more humans beyond 3–4 observers yielded diminishing returns. We discuss implications of human–computer fusion as a means to improve accuracy in complex tasks.

Download Full-text

Towards Effective Detection of Recent DDoS Attacks: A Deep Learning Approach

Security and Communication Networks ◽

10.1155/2021/5710028 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Ivandro Ortet Lopes ◽

Deqing Zou ◽

Francis A Ruambo ◽

Saeed Akbar ◽

Bin Yuan

Keyword(s):

Deep Learning ◽

Intrusion Detection System ◽

State Of The Art ◽

Detection System ◽

Denial Of Service ◽

Model Performance ◽

Distributed Denial Of Service ◽

Ddos Attacks ◽

Validation Metrics ◽

High Processing

Distributed Denial of Service (DDoS) is a predominant threat to the availability of online services due to their size and frequency. However, developing an effective security mechanism to protect a network from this threat is a big challenge because DDoS uses various attack approaches coupled with several possible combinations. Furthermore, most of the existing deep learning- (DL-) based models pose a high processing overhead or may not perform well to detect the recently reported DDoS attacks as these models use outdated datasets for training and evaluation. To address the issues mentioned earlier, we propose CyDDoS, an integrated intrusion detection system (IDS) framework, which combines an ensemble of feature engineering algorithms with the deep neural network. The ensemble feature selection is based on five machine learning classifiers used to identify and extract the most relevant features used by the predictive model. This approach improves the model performance by processing only a subset of relevant features while reducing the computation requirement. We evaluate the model performance based on CICDDoS2019, a modern and realistic dataset consisting of normal and DDoS attack traffic. The evaluation considers different validation metrics such as accuracy, precision, F1-Score, and recall to argue the effectiveness of the proposed framework against state-of-the-art IDSs.

Download Full-text

Multiresolution dendritic cell algorithm for network anomaly detection

PeerJ Computer Science ◽

10.7717/peerj-cs.749 ◽

2021 ◽

Vol 7 ◽

pp. e749

Author(s):

David Limon-Cantu ◽

Vicente Alarcon-Aquino

Keyword(s):

Information Systems ◽

Dendritic Cell ◽

Anomaly Detection ◽

State Of The Art ◽

Denial Of Service ◽

Series Data ◽

Time Frequency ◽

Computer Services ◽

Dendritic Cell Algorithm ◽

Network Anomaly Detection

Anomaly detection in computer networks is a complex task that requires the distinction of normality and anomaly. Network attack detection in information systems is a constant challenge in computer security research, as information systems provide essential services for enterprises and individuals. The consequences of these attacks could be the access, disclosure, or modification of information, as well as denial of computer services and resources. Intrusion Detection Systems (IDS) are developed as solutions to detect anomalous behavior, such as denial of service, and backdoors. The proposed model was inspired by the behavior of dendritic cells and their interactions with the human immune system, known as Dendritic Cell Algorithm (DCA), and combines the use of Multiresolution Analysis (MRA) Maximal Overlap Discrete Wavelet Transform (MODWT), as well as the segmented deterministic DCA approach (S-dDCA). The proposed approach is a binary classifier that aims to analyze a time-frequency representation of time-series data obtained from high-level network features, in order to classify data as normal or anomalous. The MODWT was used to extract the approximations of two input signal categories at different levels of decomposition, and are used as processing elements for the multi resolution DCA. The model was evaluated using the NSL-KDD, UNSW-NB15, CIC-IDS2017 and CSE-CIC-IDS2018 datasets, containing contemporary network traffic and attacks. The proposed MRA S-dDCA model achieved an accuracy of 97.37%, 99.97%, 99.56%, and 99.75% for the tested datasets, respectively. Comparisons with the DCA and state-of-the-art approaches for network anomaly detection are presented. The proposed approach was able to surpass state-of-the-art approaches with UNSW-NB15 and CSECIC-IDS2018 datasets, whereas the results obtained with the NSL-KDD and CIC-IDS2017 datasets are competitive with machine learning approaches.

Download Full-text

Low-Rate DoS Attacks Detection Based on MAF-ADM

Sensors ◽

10.3390/s20010189 ◽

2019 ◽

Vol 20 (1) ◽

pp. 189 ◽

Cited By ~ 4

Author(s):

Sijia Zhan ◽

Dan Tang ◽

Jianping Man ◽

Rui Dai ◽

Xiyin Wang

Keyword(s):

Joint Distribution ◽

Transmission Control Protocol ◽

Denial Of Service ◽

False Negative ◽

False Negative Rate ◽

Network Simulator ◽

Dos Attacks ◽

Time Frequency ◽

Anomaly Score ◽

Low Rate

Low-rate denial of service (LDoS) attacks reduce the quality of network service by sending periodical packet bursts to the bottleneck routers. It is difficult to detect by counter-DoS mechanisms due to its stealthy and low average attack traffic behavior. In this paper, we propose an anomaly detection method based on adaptive fusion of multiple features (MAF-ADM) for LDoS attacks. This study is based on the fact that the time-frequency joint distribution of the legitimate transmission control protocol (TCP) traffic would be changed under LDoS attacks. Several statistical metrics of the time-frequency joint distribution are chosen to generate isolation trees, which can simultaneously reflect the anomalies in time domain and frequency domain. Then we calculate anomaly score by fusing the results of all isolation trees according to their ability to isolate samples containing LDoS attacks. Finally, the anomaly score is smoothed by weighted moving average algorithm to avoid errors caused by noise in the network. Experimental results of Network Simulator 2 (NS2), testbed, and public datasets (WIDE2018 and LBNL) demonstrate that this method does detect LDoS attacks effectively with lower false negative rate.

Download Full-text

Multi-Hazard Exposure Mapping Using Machine Learning Techniques: A Case Study from Iran

Remote Sensing ◽

10.3390/rs11161943 ◽

2019 ◽

Vol 11 (16) ◽

pp. 1943 ◽

Cited By ~ 15

Author(s):

Omid Rahmati ◽

Saleh Yousefi ◽

Zahra Kalantari ◽

Evelyn Uuemaa ◽

Teimur Teimurian ◽

...

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Characteristic Curve ◽

Machine Learning Techniques ◽

Support Vector ◽

Mountainous Area ◽

Data Set ◽

Boosted Regression Tree ◽

Hazard Exposure ◽

Learning Techniques

Mountainous areas are highly prone to a variety of nature-triggered disasters, which often cause disabling harm, death, destruction, and damage. In this work, an attempt was made to develop an accurate multi-hazard exposure map for a mountainous area (Asara watershed, Iran), based on state-of-the art machine learning techniques. Hazard modeling for avalanches, rockfalls, and floods was performed using three state-of-the-art models—support vector machine (SVM), boosted regression tree (BRT), and generalized additive model (GAM). Topo-hydrological and geo-environmental factors were used as predictors in the models. A flood dataset (n = 133 flood events) was applied, which had been prepared using Sentinel-1-based processing and ground-based information. In addition, snow avalanche (n = 58) and rockfall (n = 101) data sets were used. The data set of each hazard type was randomly divided to two groups: Training (70%) and validation (30%). Model performance was evaluated by the true skill score (TSS) and the area under receiver operating characteristic curve (AUC) criteria. Using an exposure map, the multi-hazard map was converted into a multi-hazard exposure map. According to both validation methods, the SVM model showed the highest accuracy for avalanches (AUC = 92.4%, TSS = 0.72) and rockfalls (AUC = 93.7%, TSS = 0.81), while BRT demonstrated the best performance for flood hazards (AUC = 94.2%, TSS = 0.80). Overall, multi-hazard exposure modeling revealed that valleys and areas close to the Chalous Road, one of the most important roads in Iran, were associated with high and very high levels of risk. The proposed multi-hazard exposure framework can be helpful in supporting decision making on mountain social-ecological systems facing multiple hazards.

Download Full-text

Indoor versus outdoor scene recognition for navigation of a micro aerial vehicle using spatial color gist wavelet descriptors

Visual Computing for Industry Biomedicine and Art ◽

10.1186/s42492-019-0030-9 ◽

2019 ◽

Vol 2 (1) ◽

Cited By ~ 2

Author(s):

Anitha Ganesan ◽

Anbarasu Balasubramanian

Keyword(s):

State Of The Art ◽

Characteristic Curve ◽

Scene Recognition ◽

Color Histogram ◽

Support Vector ◽

Svm Classifier ◽

Scene Classification ◽

Visual Descriptors ◽

Outdoor Scenes ◽

Institute Of Technology

AbstractIn the context of improved navigation for micro aerial vehicles, a new scene recognition visual descriptor, called spatial color gist wavelet descriptor (SCGWD), is proposed. SCGWD was developed by combining proposed Ohta color-GIST wavelet descriptors with census transform histogram (CENTRIST) spatial pyramid representation descriptors for categorizing indoor versus outdoor scenes. A binary and multiclass support vector machine (SVM) classifier with linear and non-linear kernels was used to classify indoor versus outdoor scenes and indoor scenes, respectively. In this paper, we have also discussed the feature extraction methodology of several, state-of-the-art visual descriptors, and four proposed visual descriptors (Ohta color-GIST descriptors, Ohta color-GIST wavelet descriptors, enhanced Ohta color histogram descriptors, and SCGWDs), in terms of experimental perspectives. The proposed enhanced Ohta color histogram descriptors, Ohta color-GIST descriptors, Ohta color-GIST wavelet descriptors, SCGWD, and state-of-the-art visual descriptors were evaluated, using the Indian Institute of Technology Madras Scene Classification Image Database two, an Indoor-Outdoor Dataset, and the Massachusetts Institute of Technology indoor scene classification dataset [(MIT)-67]. Experimental results showed that the indoor versus outdoor scene recognition algorithm, employing SVM with SCGWDs, produced the highest classification rates (CRs)—95.48% and 99.82% using radial basis function kernel (RBF) kernel and 95.29% and 99.45% using linear kernel for the IITM SCID2 and Indoor-Outdoor datasets, respectively. The lowest CRs—2.08% and 4.92%, respectively—were obtained when RBF and linear kernels were used with the MIT-67 dataset. In addition, higher CRs, precision, recall, and area under the receiver operating characteristic curve values were obtained for the proposed SCGWDs, in comparison with state-of-the-art visual descriptors.

Download Full-text

DGCN: Dynamic Graph Convolutional Network for Efficient Multi-Person Pose Estimation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6867 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11924-11931

Author(s):

Zhongwei Qiu ◽

Kai Qiu ◽

Jianlong Fu ◽

Dongmei Fu

Keyword(s):

Pose Estimation ◽

State Of The Art ◽

Semantic Relations ◽

Dynamic Graphs ◽

Dynamic Graph ◽

Convolutional Network ◽

Bottom Up ◽

Multi Level ◽

Human Pose ◽

Relative Gains

Multi-person pose estimation aims to detect human keypoints from images with multiple persons. Bottom-up methods for multi-person pose estimation have attracted extensive attention, owing to the good balance between efficiency and accuracy. Recent bottom-up methods usually follow the principle of keypoints localization and grouping, where relations between keypoints are the keys to group keypoints. These relations spontaneously construct a graph of keypoints, where the edges represent the relations between two nodes (i.e., keypoints). Existing bottom-up methods mainly define relations by empirically picking out edges from this graph, while omitting edges that may contain useful semantic relations. In this paper, we propose a novel Dynamic Graph Convolutional Module (DGCM) to model rich relations in the keypoints graph. Specifically, we take into account all relations (all edges of the graph) and construct dynamic graphs to tolerate large variations of human pose. The DGCM is quite lightweight, which allows it to be stacked like a pyramid architecture and learn structural relations from multi-level features. Our network with single DGCM based on ResNet-50 achieves relative gains of 3.2% and 4.8% over state-of-the-art bottom-up methods on COCO keypoints and MPII dataset, respectively.

Download Full-text

Amylase, lipase, pancreatic isoamylase, and phospholipase A in diagnosis of acute pancreatitis

Clinical Chemistry ◽

10.1093/clinchem/41.8.1129 ◽

1995 ◽

Vol 41 (8) ◽

pp. 1129-1134 ◽

Cited By ~ 26

Author(s):

P Clavé ◽

S Guillaumes ◽

I Blanco ◽

N Nabau ◽

J Mercé ◽

...

Keyword(s):

Acute Pancreatitis ◽

Operating Characteristic ◽

Serum Amylase ◽

Acute Abdominal Pain ◽

Characteristic Curve ◽

Threshold Value ◽

Phospholipase A ◽

Diagnostic Efficiency ◽

Pancreatic Isoamylase ◽

Parallel Fashion

Abstract To determine the utility of serum amylase (AMY), lipase (Lp), pancreatic isoamylase (isoA), phospholipase A (PLA), and urine AMY in the diagnosis of acute pancreatitis, samples of serum and urine were obtained on admission and every day thereafter for 5 days from 384 patients with acute abdominal pain. Diagnostic accuracy, determined as the area under the receiver operating characteristic curve, was > 0.975 for serum AMY, Lp, isoA, and urine AMY. For each of these enzymes, a threshold value (twice to sixfold the upper limit of the reference values) offering diagnostic efficiency > 95% could be determined. In contrast, accuracy and efficiency of serum PLA were low. The profiles of these enzymes in acute pancreatitis decreased in a parallel fashion over 5 days except for PLA. We conclude that diagnostic utilities are similar for serum AMY, Lp, isoA, and urine AMY for acute pancreatitis, provided that an appropriate threshold is established.

Download Full-text