scholarly journals A Comparative Study of Unsupervised Anomaly Detection Techniques Using Honeypot Data

2010 ◽  
Vol E93-D (9) ◽  
pp. 2544-2554 ◽  
Author(s):  
Jungsuk SONG ◽  
Hiroki TAKAKURA ◽  
Yasuo OKABE ◽  
Daisuke INOUE ◽  
Masashi ETO ◽  
...  
Author(s):  
Catherine Cheung ◽  
Julio J. Valdés ◽  
Richard Salas Chavez ◽  
Srishti Sehgal

In this work, the sensor data related to a diesel engine system and specifically its turbocharger subsystem were analyzed. An incident where the turbocharger seized was recorded by dozens of standard turbocharger-related sensors. By training models to distinguish between normal healthy operating conditions and deteriorated conditions, there is an opportunity to develop prognostic and predictive tools to ideally help prevent a similar occurrence in the future. Analysis of this event provides an opportunity to identify changes in equipment indicators with a known outcome. A number of data analysis tools were used to characterize the healthy and deteriorated states of the turbocharger system, including various supervised classification as well as semi-supervised and unsupervised anomaly detection techniques. The leader clustering algorithm was also implemented to reduce the amount of data to train and develop the models. This paper describes the results of this modeling process, validated by testing on healthy data from the same propulsion system and a second distinct one. Although this problem posed challenges due to the severely imbalanced class distribution, the supervised classifiers, in particular Support Vector Machine (SVM) and Random Forest (RF), performed very well in all metrics while the unsupervised anomaly detection models achieved near-perfect accuracy for identifying healthy turbocharger states.


Sensors ◽  
2019 ◽  
Vol 19 (11) ◽  
pp. 2451 ◽  
Author(s):  
Mohsin Munir ◽  
Shoaib Ahmed Siddiqui ◽  
Muhammad Ali Chattha ◽  
Andreas Dengel ◽  
Sheraz Ahmed

The need for robust unsupervised anomaly detection in streaming data is increasing rapidly in the current era of smart devices, where enormous data are gathered from numerous sensors. These sensors record the internal state of a machine, the external environment, and the interaction of machines with other machines and humans. It is of prime importance to leverage this information in order to minimize downtime of machines, or even avoid downtime completely by constant monitoring. Since each device generates a different type of streaming data, it is normally the case that a specific kind of anomaly detection technique performs better than the others depending on the data type. For some types of data and use-cases, statistical anomaly detection techniques work better, whereas for others, deep learning-based techniques are preferred. In this paper, we present a novel anomaly detection technique, FuseAD, which takes advantage of both statistical and deep-learning-based approaches by fusing them together in a residual fashion. The obtained results show an increase in area under the curve (AUC) as compared to state-of-the-art anomaly detection methods when FuseAD is tested on a publicly available dataset (Yahoo Webscope benchmark). The obtained results advocate that this fusion-based technique can obtain the best of both worlds by combining their strengths and complementing their weaknesses. We also perform an ablation study to quantify the contribution of the individual components in FuseAD, i.e., the statistical ARIMA model as well as the deep-learning-based convolutional neural network (CNN) model.


Anomaly detection has vital role in data preprocessing and also in the mining of outstanding points for marketing, network sensors, fraud detection, intrusion detection, stock market analysis. Recent studies have been found to concentrate more on outlier detection for real time datasets. Anomaly detection study is at present focuses on the expansion of innovative machine learning methods and on enhancing the computation time. Sentiment mining is the process to discover how people feel about a particular topic. Though many anomaly detection techniques have been proposed, it is also notable that the research focus lacks a comparative performance evaluation in sentiment mining datasets. In this study, three popular unsupervised anomaly detection algorithms such as density based, statistical based and cluster based anomaly detection methods are evaluated on movie review sentiment mining dataset. This paper will set a base for anomaly detection methods in sentiment mining research. The results show that density based (LOF) anomaly detection method suits best for the movie review sentiment dataset.


2020 ◽  
Vol 51 (7) ◽  
pp. 649-667
Author(s):  
Esteban Jove ◽  
José-Luis Casteleiro-Roca ◽  
Roberto Casado-Vara ◽  
Héctor Quintián ◽  
Juan Albino Méndez Pérez ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3005
Author(s):  
A. N. M. Bazlur Rashid ◽  
Mohiuddin Ahmed ◽  
Al-Sakib Khan Pathan

While anomaly detection is very important in many domains, such as in cybersecurity, there are many rare anomalies or infrequent patterns in cybersecurity datasets. Detection of infrequent patterns is computationally expensive. Cybersecurity datasets consist of many features, mostly irrelevant, resulting in lower classification performance by machine learning algorithms. Hence, a feature selection (FS) approach, i.e., selecting relevant features only, is an essential preprocessing step in cybersecurity data analysis. Despite many FS approaches proposed in the literature, cooperative co-evolution (CC)-based FS approaches can be more suitable for cybersecurity data preprocessing considering the Big Data scenario. Accordingly, in this paper, we have applied our previously proposed CC-based FS with random feature grouping (CCFSRFG) to a benchmark cybersecurity dataset as the preprocessing step. The dataset with original features and the dataset with a reduced number of features were used for infrequent pattern detection. Experimental analysis was performed and evaluated using 10 unsupervised anomaly detection techniques. Therefore, the proposed infrequent pattern detection is termed Unsupervised Infrequent Pattern Detection (UIPD). Then, we compared the experimental results with and without FS in terms of true positive rate (TPR). Experimental analysis indicates that the highest rate of TPR improvement was by cluster-based local outlier factor (CBLOF) of the backdoor infrequent pattern detection, and it was 385.91% when using FS. Furthermore, the highest overall infrequent pattern detection TPR was improved by 61.47% for all infrequent patterns using clustering-based multivariate Gaussian outlier score (CMGOS) with FS.


Forests ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 194
Author(s):  
Álvaro García Faura ◽  
Dejan Štepec ◽  
Matija Cankar ◽  
Miha Humar

Wood is considered one of the most important construction materials, as well as a natural material prone to degradation, with fungi being the main reason for wood failure in a temperate climate. Visual inspection of wood or other approaches for monitoring are time-consuming, and the incipient stages of decay are not always visible. Thus, visual decay detection and such manual monitoring could be replaced by automated real-time monitoring systems. The capabilities of such systems can range from simple monitoring, periodically reporting data, to the automatic detection of anomalous measurements that may happen due to various environmental or technical reasons. In this paper, we explore the application of Unsupervised Anomaly Detection (UAD) techniques to wood Moisture Content (MC) data. Specifically, data were obtained from a wood construction that was monitored for four years using sensors at different positions. Our experimental results prove the validity of these techniques to detect both artificial and real anomalies in MC signals, encouraging further research to enable their deployment in real use cases.


Sign in / Sign up

Export Citation Format

Share Document