scholarly journals Unsupervised Anomaly Detection Based on Deep Autoencoding and Clustering

2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Chuanlei Zhang ◽  
Jiangtao Liu ◽  
Wei Chen ◽  
Jinyuan Shi ◽  
Minda Yao ◽  
...  

The unsupervised anomaly detection task based on high-dimensional or multidimensional data occupies a very important position in the field of machine learning and industrial applications; especially in the aspect of network security, the anomaly detection of network data is particularly important. The key to anomaly detection is density estimation. Although the methods of dimension reduction and density estimation have made great progress in recent years, most dimension reduction methods are difficult to retain the key information of original data or multidimensional data. Recent studies have shown that the deep autoencoder (DAE) can solve this problem well. In order to improve the performance of unsupervised anomaly detection, we propose an anomaly detection scheme based on a deep autoencoder (DAE) and clustering methods. The deep autoencoder is trained to learn the compressed representation of the input data and then feed it to clustering approach. This scheme makes full use of the advantages of the deep autoencoder (DAE) to generate low-dimensional representation and reconstruction errors for the input high-dimensional or multidimensional data and uses them to reconstruct the input samples. The proposed scheme could eliminate redundant information contained in the data, improve performance of clustering methods in identifying abnormal samples, and reduce the amount of calculation. To verify the effectiveness of the proposed scheme, massive comparison experiments have been conducted with traditional dimension reduction algorithms and clustering methods. The results of experiments demonstrate that, in most cases, the proposed scheme outperforms the traditional dimension reduction algorithms with different clustering methods.

2022 ◽  
Vol 70 (3) ◽  
pp. 5363-5381
Author(s):  
Amgad Muneer ◽  
Shakirah Mohd Taib ◽  
Suliman Mohamed Fati ◽  
Abdullateef O. Balogun ◽  
Izzatdin Abdul Aziz

Sensors ◽  
2020 ◽  
Vol 20 (24) ◽  
pp. 7285
Author(s):  
Donghyun Kim ◽  
Sangbong Lee ◽  
Jihwan Lee

This study proposes an unsupervised anomaly detection method using sensor streams from the marine engine to detect the anomalous system behavior, which may be a possible sign of system failure. Previous works on marine engine anomaly detection proposed a clustering-based or statistical control chart-based approach that is unstable according to the choice of hyperparameters, or cannot fit well to the high-dimensional dataset. As a remedy to this limitation, this study adopts an ensemble-based approach to anomaly detection. The idea is to train several anomaly detectors with varying hyperparameters in parallel and then combine its result in the anomaly detection phase. Because the anomaly is detected by the combination of different detectors, it is robust to the choice of hyperparameters without loss of accuracy. To demonstrate our methodology, an actual dataset obtained from a 200,000-ton cargo vessel from a Korean shipping company that uses two-stroke diesel engine is analyzed. As a result, anomalies were successfully detected from the high-dimensional and large-scale dataset. After detecting the anomaly, clustering analysis was conducted to the anomalous observation to examine anomaly patterns. By investigating each cluster’s feature distribution, several common patterns of abnormal behavior were successfully visualized. Although we analyzed the data from two-stroke diesel engine, our method can be applied to various types of marine engine.


Anomaly detection is the most important task in data mining techniques. This helps to increase the scalability, accuracy and efficiency. During the extraction process, the outsource may damage their original data set and that will be defined as the intrusion. To avoid the intrusion and maintain the anomaly detection in a high densely populated environment is another difficult task. For that purpose, Grid Partitioning for Anomaly Detection (GPAD) has been proposed for high density environment. This technique will detect the outlier using the grid partitioning approach and density based outlier detection scheme. Initially, all the data sets will be split in the grid format. Allocate the equal amount of data points to each grid. Compare the density of each grid to their neighbor grid in a zigzag manner. Based on the response, lesser density grid will be detected as outlier function as well as that grid will be eliminated. This proposed Grid Partitioning for Anomaly Detection (GPAD) has reduced the complexity and increases the accuracy and these will be proven in simulation part.


2008 ◽  
Vol 65 (6) ◽  
pp. 1941-1954 ◽  
Author(s):  
Illia Horenko

Abstract A problem of simultaneous dimension reduction and identification of hidden attractive manifolds in multidimensional data with noise is considered. The problem is approached in two consecutive steps: (i) embedding the original data in a sufficiently high-dimensional extended space in a way proposed by Takens in his embedding theorem, followed by (ii) a minimization of the residual functional. The residual functional is constructed to measure the distance between the original data in extended space and their reconstruction based on a low-dimensional description. The reduced representation of the analyzed data results from projection onto a fixed number of unknown low-dimensional manifolds. Two specific forms of the residual functional are proposed, defining two different types of essential coordinates: (i) localized essential orthogonal functions (EOFs) and (ii) localized functions called principal original components (POCs). The application of the framework is exemplified both on a Lorenz attractor model with measurement noise and on historical air temperature data. It is demonstrated how the new method can be used for the elimination of noise and identification of the seasonal low-frequency components in meteorological data. An application of the proposed POCs in the context of the low-dimensional predictive models construction is presented.


2021 ◽  
Vol 2021 ◽  
pp. 1-19
Author(s):  
Shicheng Li ◽  
Shumin Lai ◽  
Yan Jiang ◽  
Wenle Wang ◽  
Yugen Yi

Anomaly detection (AD) aims to distinguish the data points that are inconsistent with the overall pattern of the data. Recently, unsupervised anomaly detection methods have aroused huge attention. Among these methods, feature representation (FR) plays an important role, which can directly affect the performance of anomaly detection. Sparse representation (SR) can be regarded as one of matrix factorization (MF) methods, which is a powerful tool for FR. However, there are some limitations in the original SR. On the one hand, it just learns the shallow feature representations, which leads to the poor performance for anomaly detection. On the other hand, the local geometry structure information of data is ignored. To address these shortcomings, a graph regularized deep sparse representation (GRDSR) approach is proposed for unsupervised anomaly detection in this work. In GRDSR, a deep representation framework is first designed by extending the single layer MF to a multilayer MF for extracting hierarchical structure from the original data. Next, a graph regularization term is introduced to capture the intrinsic local geometric structure information of the original data during the process of FR, making the deep features preserve the neighborhood relationship well. Then, a L1-norm-based sparsity constraint is added to enhance the discriminant ability of the deep features. Finally, a reconstruction error is applied to distinguish anomalies. In order to demonstrate the effectiveness of the proposed approach, we conduct extensive experiments on ten datasets. Compared with the state-of-the-art methods, the proposed approach can achieve the best performance.


2017 ◽  
Author(s):  
Sahir Rai Bhatnagar ◽  
Yi Yang ◽  
Budhachandra Khundrakpam ◽  
Alan C Evans ◽  
Mathieu Blanchette ◽  
...  

AbstractPredicting a phenotype and understanding which variables improve that prediction are two very challenging and overlapping problems in analysis of high-dimensional data such as those arising from genomic and brain imaging studies. It is often believed that the number of truly important predictors is small relative to the total number of variables, making computational approaches to variable selection and dimension reduction extremely important. To reduce dimensionality, commonly-used two-step methods first cluster the data in some way, and build models using cluster summaries to predict the phenotype.It is known that important exposure variables can alter correlation patterns between clusters of high-dimensional variables, i.e., alter network properties of the variables. However, it is not well understood whether such altered clustering is informative in prediction. Here, assuming there is a binary exposure with such network-altering effects, we explore whether use of exposure-dependent clustering relationships in dimension reduction can improve predictive modelling in a two-step framework. Hence, we propose a modelling framework called ECLUST to test this hypothesis, and evaluate its performance through extensive simulations.With ECLUST, we found improved prediction and variable selection performance compared to methods that do not consider the environment in the clustering step, or to methods that use the original data as features. We further illustrate this modelling framework through the analysis of three data sets from very different fields, each with high dimensional data, a binary exposure, and a phenotype of interest. Our method is available in the eclust CRAN package.


Sign in / Sign up

Export Citation Format

Share Document