A Fast Outlier Detection Algorithm for High Dimensional Categorical Data Streams

2007 ◽  
Vol 18 (4) ◽  
pp. 933 ◽  
Author(s):  
Xiao-Yun ZHOU
2012 ◽  
Vol 6-7 ◽  
pp. 621-624
Author(s):  
Hong Bin Fang

Outlier detection is an important field of data mining, which is widely used in credit card fraud detection, network intrusion detection ,etc. A kind of high dimensional data similarity metric function and the concept of class density are given in the paper, basing on the combination of hierarchical clustering and similarity, as well as outlier detection algorithm about similarity measurement is presented after the redefinition of high dimension density outliers is put. The algorithm has some value for outliers detection of high dimensional data set in view of experimental result.


2010 ◽  
Vol 29 (3) ◽  
pp. 697-725 ◽  
Author(s):  
Anna Koufakou ◽  
Jimmy Secretan ◽  
Michael Georgiopoulos

2019 ◽  
Vol 16 (9) ◽  
pp. 3938-3944
Author(s):  
Atul Garg ◽  
Kamaljeet Kaur

In this era, detection of outliers or anomalies from high dimensional data is really a great challenge. Normal data is distinguished from data containing anomalies using Outlier detection techniques which classifies new data as normal or abnormal. Different Outlier Detection algorithms are proposed by many researchers for high dimensional data and each algorithm has its own benefits and limitations. In the literature the researchers proposed different algorithms. For this work few algorithms such as Dice-Coefficient Index (DCI), Mapreduce Function and Linear Discriminant Analysis Algorithm (LDA) are considered. Mapreduce function is used to overcome the problem of large datasets. LDA is basically used in the reduction of the data dimensionality. In the present work a novel Hybrid Outlier Detection Algorithm (HbODA) is proposed for efficiently detection of outliers in high dimensional data. The important parameters efficiency, accuracy, computation cost, precision, recall etc. are focused for analyzing the performance of the novel hybrid algorithm. Experimental results on real large sets show that the proposed algorithm is better in detecting outliers than other traditional methods.


2014 ◽  
Vol 6 ◽  
pp. 830402 ◽  
Author(s):  
Changhao Piao ◽  
Zhi Huang ◽  
Ling Su ◽  
Sheng Lu

Battery system is the key part of the electric vehicle. To realize outlier detection in the running process of battery system effectively, a new high-dimensional data stream outlier detection algorithm (DSOD) based on angle distribution is proposed. First, in order to improve the algorithm stability in high-dimensional space, the method of angle distribution-based outlier detection algorithm is employed. Second, to reduce the computational complexity, a small-scale calculation set of data stream is established, which is composed of normal set and border set. For the purpose of solving the problem of concept drift, an update mechanism for the normal set and border set is developed in this paper. By this way, these hidden abnormal points will be rapidly detected. The experimental results on real data sets and battery system simulation data sets demonstrate that DSOD is more efficient than Simple variance of angles (Simple VOA) and angle-based outlier detection (ABOD) and is very suitable for the evaluation of battery system safety.


2016 ◽  
Vol 12 (1) ◽  
pp. 35 ◽  
Author(s):  
Li Lian Sheng

Nowadays, Radio frequency identification (RFID) has been extensively deployed to retailing, supply chain management, object recognition, object monitoring and tracking and many other fields. Detecting outliers in RFID data streams can help us find abnormal activities and thus avoid disasters. In order to detect outliers in RFID data streams efficiently and effectively, we proposed a fractal based outlier detection algorithm. Firstly, we built a monotone searching space based on the self-similarity of fractal. Then, we proposed two piecewise fractal models for RFID data streams, and presented an outlier detection algorithm based on the piecewise fractal model. Finally, we validated the efficiency and effectiveness of the proposed algorithm by massive experiments.


Sign in / Sign up

Export Citation Format

Share Document