Evaluation of Unsupervised Anomaly Detection Methods in Sentiment Mining

Anomaly detection has vital role in data preprocessing and also in the mining of outstanding points for marketing, network sensors, fraud detection, intrusion detection, stock market analysis. Recent studies have been found to concentrate more on outlier detection for real time datasets. Anomaly detection study is at present focuses on the expansion of innovative machine learning methods and on enhancing the computation time. Sentiment mining is the process to discover how people feel about a particular topic. Though many anomaly detection techniques have been proposed, it is also notable that the research focus lacks a comparative performance evaluation in sentiment mining datasets. In this study, three popular unsupervised anomaly detection algorithms such as density based, statistical based and cluster based anomaly detection methods are evaluated on movie review sentiment mining dataset. This paper will set a base for anomaly detection methods in sentiment mining research. The results show that density based (LOF) anomaly detection method suits best for the movie review sentiment dataset.

Download Full-text

FuseAD: Unsupervised Anomaly Detection in Streaming Sensors Data by Fusing Statistical and Deep Learning Models

Sensors ◽

10.3390/s19112451 ◽

2019 ◽

Vol 19 (11) ◽

pp. 2451 ◽

Cited By ~ 13

Author(s):

Mohsin Munir ◽

Shoaib Ahmed Siddiqui ◽

Muhammad Ali Chattha ◽

Andreas Dengel ◽

Sheraz Ahmed

Keyword(s):

Deep Learning ◽

Anomaly Detection ◽

Internal State ◽

Arima Model ◽

Streaming Data ◽

Detection Technique ◽

Detection Methods ◽

Smart Devices ◽

Detection Techniques ◽

Unsupervised Anomaly Detection

The need for robust unsupervised anomaly detection in streaming data is increasing rapidly in the current era of smart devices, where enormous data are gathered from numerous sensors. These sensors record the internal state of a machine, the external environment, and the interaction of machines with other machines and humans. It is of prime importance to leverage this information in order to minimize downtime of machines, or even avoid downtime completely by constant monitoring. Since each device generates a different type of streaming data, it is normally the case that a specific kind of anomaly detection technique performs better than the others depending on the data type. For some types of data and use-cases, statistical anomaly detection techniques work better, whereas for others, deep learning-based techniques are preferred. In this paper, we present a novel anomaly detection technique, FuseAD, which takes advantage of both statistical and deep-learning-based approaches by fusing them together in a residual fashion. The obtained results show an increase in area under the curve (AUC) as compared to state-of-the-art anomaly detection methods when FuseAD is tested on a publicly available dataset (Yahoo Webscope benchmark). The obtained results advocate that this fusion-based technique can obtain the best of both worlds by combining their strengths and complementing their weaknesses. We also perform an ablation study to quantify the contribution of the individual components in FuseAD, i.e., the statistical ARIMA model as well as the deep-learning-based convolutional neural network (CNN) model.

Download Full-text

Indirectly Supervised Anomaly Detection of Clinically Meaningful Health Events from Smart Home Data

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3439870 ◽

2021 ◽

Vol 12 (2) ◽

pp. 1-18

Author(s):

Jessamyn Dahmen ◽

Diane J. Cook

Keyword(s):

Anomaly Detection ◽

Time Series Data ◽

Bayesian Optimization ◽

Detection Methods ◽

Series Data ◽

Detection Techniques ◽

Health Events ◽

Warm Start ◽

Health Related ◽

Unsupervised Algorithms

Anomaly detection techniques can extract a wealth of information about unusual events. Unfortunately, these methods yield an abundance of findings that are not of interest, obscuring relevant anomalies. In this work, we improve upon traditional anomaly detection methods by introducing Isudra, an Indirectly Supervised Detector of Relevant Anomalies from time series data. Isudra employs Bayesian optimization to select time scales, features, base detector algorithms, and algorithm hyperparameters that increase true positive and decrease false positive detection. This optimization is driven by a small amount of example anomalies, driving an indirectly supervised approach to anomaly detection. Additionally, we enhance the approach by introducing a warm-start method that reduces optimization time between similar problems. We validate the feasibility of Isudra to detect clinically relevant behavior anomalies from over 2M sensor readings collected in five smart homes, reflecting 26 health events. Results indicate that indirectly supervised anomaly detection outperforms both supervised and unsupervised algorithms at detecting instances of health-related anomalies such as falls, nocturia, depression, and weakness.

Download Full-text

Unsupervised Anomaly Detection with Distillated Teacher-Student Network Ensemble

Entropy ◽

10.3390/e23020201 ◽

2021 ◽

Vol 23 (2) ◽

pp. 201

Author(s):

Qinfeng Xiao ◽

Jing Wang ◽

Youfang Lin ◽

Wenbo Gongsa ◽

Ganghui Hu ◽

...

Keyword(s):

Anomaly Detection ◽

Multivariate Data ◽

Failure Detection ◽

Superior Performance ◽

Detection Algorithms ◽

Teacher Student ◽

Model Complex ◽

Unsupervised Anomaly Detection ◽

Real World Datasets ◽

Complex Features

We address the problem of unsupervised anomaly detection for multivariate data. Traditional machine learning based anomaly detection algorithms rely on specific assumptions of normal patterns and fail to model complex feature interactions and relations. Recently, existing deep learning based methods are promising for extracting representations from complex features. These methods train an auxiliary task, e.g., reconstruction and prediction, on normal samples. They further assume that anomalies fail to perform well on the auxiliary task since they are never trained during the model optimization. However, the assumption does not always hold in practice. Deep models may also perform the auxiliary task well on anomalous samples, leading to the failure detection of anomalies. To effectively detect anomalies for multivariate data, this paper introduces a teacher-student distillation based framework Distillated Teacher-Student Network Ensemble (DTSNE). The paradigm of the teacher-student distillation is able to deal with high-dimensional complex features. In addition, an ensemble of student networks provides a better capability to avoid generalizing the auxiliary task performance on anomalous samples. To validate the effectiveness of our model, we conduct extensive experiments on real-world datasets. Experimental results show superior performance of DTSNE over competing methods. Analysis and discussion towards the behavior of our model are also provided in the experiment section.

Download Full-text

Quantitative comparison of unsupervised anomaly detection algorithms for intrusion detection

Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing - SAC '19 ◽

10.1145/3297280.3297314 ◽

2019 ◽

Cited By ~ 5

Author(s):

Filipe Falcão ◽

Tommaso Zoppi ◽

Caio Barbosa Viera Silva ◽

Anderson Santos ◽

Baldoino Fonseca ◽

...

Keyword(s):

Intrusion Detection ◽

Anomaly Detection ◽

Quantitative Comparison ◽

Detection Algorithms ◽

Unsupervised Anomaly Detection

Download Full-text

Multivariate Anomaly Detection for Earth Observations: A Comparison of Algorithms and Feature Extraction Techniques

10.5194/esd-2016-51 ◽

2016 ◽

Cited By ~ 1

Author(s):

Milan Flach ◽

Fabian Gans ◽

Alexander Brenning ◽

Joachim Denzler ◽

Markus Reichstein ◽

...

Keyword(s):

Feature Extraction ◽

Anomaly Detection ◽

Data Streams ◽

Multivariate Data ◽

Detection Methods ◽

Earth System ◽

Earth System Science ◽

System Science ◽

Detection Algorithms ◽

Earth Observations

Abstract. Today, many processes at the Earth's surface are constantly monitored by multiple data streams. These observations have become central to advance our understanding of e.g. vegetation dynamics in response to climate or land use change. Another set of important applications is monitoring effects of climatic extreme events, other disturbances such as fires, or abrupt land transitions. One important methodological question is how to reliably detect anomalies in an automated and generic way within multivariate data streams, which typically vary seasonally and are interconnected across variables. Although many algorithms have been proposed for detecting anomalies in multivariate data, only few have been investigated in the context of Earth system science applications. In this study, we systematically combine and compare feature extraction and anomaly detection algorithms for detecting anomalous events. Our aim is to identify suitable workflows for automatically detecting anomalous patterns in multivariate Earth system data streams. We rely on artificial data that mimic typical properties and anomalies in multivariate spatiotemporal Earth observations. This artificial experiment is needed as there is no 'gold standard' for the identification of anomalies in real Earth observations. Our results show that a well chosen feature extraction step (e.g. subtracting seasonal cycles, or dimensionality reduction) is more important than the choice of a particular anomaly detection algorithm. Nevertheless, we identify 3 detection algorithms (k-nearest neighbours mean distance, kernel density estimation, a recurrence approach) and their combinations (ensembles) that outperform other multivariate approaches as well as univariate extreme event detection methods. Our results therefore provide an effective workflow to automatically detect anomalies in Earth system science data.

Download Full-text

Failure Modeling of a Propulsion Subsystem: Unsupervised and Semi-Supervised Approaches to Anomaly Detection

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419400196 ◽

2019 ◽

Vol 33 (11) ◽

pp. 1940019 ◽

Cited By ~ 2

Author(s):

Catherine Cheung ◽

Julio J. Valdés ◽

Richard Salas Chavez ◽

Srishti Sehgal

Keyword(s):

Anomaly Detection ◽

Clustering Algorithm ◽

Operating Conditions ◽

Sensor Data ◽

Support Vector ◽

Training Models ◽

Detection Techniques ◽

Failure Modeling ◽

Supervised Classifiers ◽

Unsupervised Anomaly Detection

In this work, the sensor data related to a diesel engine system and specifically its turbocharger subsystem were analyzed. An incident where the turbocharger seized was recorded by dozens of standard turbocharger-related sensors. By training models to distinguish between normal healthy operating conditions and deteriorated conditions, there is an opportunity to develop prognostic and predictive tools to ideally help prevent a similar occurrence in the future. Analysis of this event provides an opportunity to identify changes in equipment indicators with a known outcome. A number of data analysis tools were used to characterize the healthy and deteriorated states of the turbocharger system, including various supervised classification as well as semi-supervised and unsupervised anomaly detection techniques. The leader clustering algorithm was also implemented to reduce the amount of data to train and develop the models. This paper describes the results of this modeling process, validated by testing on healthy data from the same propulsion system and a second distinct one. Although this problem posed challenges due to the severely imbalanced class distribution, the supervised classifiers, in particular Support Vector Machine (SVM) and Random Forest (RF), performed very well in all metrics while the unsupervised anomaly detection models achieved near-perfect accuracy for identifying healthy turbocharger states.

Download Full-text

egoDetect: Visual Detection and Exploration of Anomaly in Social Communication Network

Sensors ◽

10.3390/s20205895 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5895

Author(s):

Jiansu Pu ◽

Jingwen Zhang ◽

Hui Shao ◽

Tingting Zhang ◽

Yunbo Rao

Keyword(s):

Social Networks ◽

Anomaly Detection ◽

Communication Networks ◽

Social Communication ◽

Detection Method ◽

Detection Methods ◽

Visualization System ◽

Egocentric Network ◽

Unsupervised Anomaly Detection ◽

The Relationship

The development of the Internet has made social communication increasingly important for maintaining relationships between people. However, advertising and fraud are also growing incredibly fast and seriously affect our daily life, e.g., leading to money and time losses, trash information, and privacy problems. Therefore, it is very important to detect anomalies in social networks. However, existing anomaly detection methods cannot guarantee the correct rate. Besides, due to the lack of labeled data, we also cannot use the detection results directly. In other words, we still need human analysts in the loop to provide enough judgment for decision making. To help experts analyze and explore the results of anomaly detection in social networks more objectively and effectively, we propose a novel visualization system, egoDetect, which can detect the anomalies in social communication networks efficiently. Based on the unsupervised anomaly detection method, the system can detect the anomaly without training and get the overview quickly. Then we explore an ego’s topology and the relationship between egos and alters by designing a novel glyph based on the egocentric network. Besides, it also provides rich interactions for experts to quickly navigate to the interested users for further exploration. We use an actual call dataset provided by an operator to evaluate our system. The result proves that our proposed system is effective in the anomaly detection of social networks.

Download Full-text

Add-On Anomaly Threshold Technique for Improving Unsupervised Intrusion Detection on SCADA Data

Electronics ◽

10.3390/electronics9061017 ◽

2020 ◽

Vol 9 (6) ◽

pp. 1017 ◽

Cited By ~ 1

Author(s):

Abdulmohsen Almalawi ◽

Adil Fahad ◽

Zahir Tari ◽

Asif Irshad Khan ◽

Nouf Alzahrani ◽

...

Keyword(s):

Anomaly Detection ◽

Supervisory Control ◽

State Of The Art ◽

Current Approach ◽

Industrial Processes ◽

Parameter Choice ◽

Detection Algorithms ◽

Detection Approach ◽

Unsupervised Anomaly Detection ◽

Threshold Technique

Supervisory control and data acquisition (SCADA) systems monitor and supervise our daily infrastructure systems and industrial processes. Hence, the security of the information systems of critical infrastructures cannot be overstated. The effectiveness of unsupervised anomaly detection approaches is sensitive to parameter choices, especially when the boundaries between normal and abnormal behaviours are not clearly distinguishable. Therefore, the current approach in detecting anomaly for SCADA is based on the assumptions by which anomalies are defined; these assumptions are controlled by a parameter choice. This paper proposes an add-on anomaly threshold technique to identify the observations whose anomaly scores are extreme and significantly deviate from others, and then such observations are assumed to be ”abnormal”. The observations whose anomaly scores are significantly distant from ”abnormal” ones will be assumed as ”normal”. Then, the ensemble-based supervised learning is proposed to find a global and efficient anomaly threshold using the information of both ”normal”/”abnormal” behaviours. The proposed technique can be used for any unsupervised anomaly detection approach to mitigate the sensitivity of such parameters and improve the performance of the SCADA unsupervised anomaly detection approaches. Experimental results confirm that the proposed technique achieved a significant improvement compared to the state-of-the-art of two unsupervised anomaly detection algorithms.

Download Full-text

Finding new physics without learning about it: anomaly detection as a tool for searches at colliders

The European Physical Journal C ◽

10.1140/epjc/s10052-020-08807-w ◽

2021 ◽

Vol 81 (1) ◽

Cited By ~ 1

Author(s):

M. Crispim Romão ◽

N. F. Castro ◽

R. Pedro

Keyword(s):

Anomaly Detection ◽

New Physics ◽

Machine Learning Techniques ◽

Detection Methods ◽

Support Vector ◽

Support Vector Data Description ◽

Vector Data ◽

Detection Techniques ◽

Learning Techniques ◽

New Strategy

AbstractIn this paper we propose a new strategy, based on anomaly detection methods, to search for new physics phenomena at colliders independently of the details of such new events. For this purpose, machine learning techniques are trained using Standard Model events, with the corresponding outputs being sensitive to physics beyond it. We explore three novel AD methods in HEP: Isolation Forest, Histogram-Based Outlier Detection, and Deep Support Vector Data Description; alongside the most customary Autoencoder. In order to evaluate the sensitivity of the proposed approach, predictions from specific new physics models are considered and compared to those achieved when using fully supervised deep neural networks. A comparison between shallow and deep anomaly detection techniques is also presented. Our results demonstrate the potential of semi-supervised anomaly detection techniques to extensively explore the present and future hadron colliders’ data.

Download Full-text

Recent Advances in Anomaly Detection Methods Applied to Aviation

Aerospace ◽

10.3390/aerospace6110117 ◽

2019 ◽

Vol 6 (11) ◽

pp. 117 ◽

Cited By ~ 11

Author(s):

Luis Basora ◽

Xavier Olive ◽

Thomas Dubot

Keyword(s):

Anomaly Detection ◽

Time Series Data ◽

Data Driven ◽

Sensor Data ◽

Detection Methods ◽

Series Data ◽

Detection Techniques ◽

Advantages And Disadvantages ◽

Recent Advances ◽

Flight Operations

Anomaly detection is an active area of research with numerous methods and applications. This survey reviews the state-of-the-art of data-driven anomaly detection techniques and their application to the aviation domain. After a brief introduction to the main traditional data-driven methods for anomaly detection, we review the recent advances in the area of neural networks, deep learning and temporal-logic based learning. In particular, we cover unsupervised techniques applicable to time series data because of their relevance to the aviation domain, where the lack of labeled data is the most usual case, and the nature of flight trajectories and sensor data is sequential, or temporal. The advantages and disadvantages of each method are presented in terms of computational efficiency and detection efficacy. The second part of the survey explores the application of anomaly detection techniques to aviation and their contributions to the improvement of the safety and performance of flight operations and aviation systems. As far as we know, some of the presented methods have not yet found an application in the aviation domain. We review applications ranging from the identification of significant operational events in air traffic operations to the prediction of potential aviation system failures for predictive maintenance.

Download Full-text