A Data Stream Outlier Detection Algorithm Based on Reverse K Nearest Neighbors

This paper proposes a new data stream outlier detection algorithm SODRNN based on reverse nearest neighbors. We deal with the sliding window model, where outlier queries are performed in order to detect anomalies in the current window. The update of insertion or deletion only needs one scan of the current window, which improves efficiency. The capability of queries at arbitrary time on the whole current window is achieved by Query Manager Procedure, which can capture the phenomenon of concept drift of data stream in time. Results of experiments conducted on both synthetic and real data sets show that SODRNN algorithm is both effective and efficient.

Download Full-text

Research on Outlier Detection Algorithm for Evaluation of Battery System Safety

Advances in Mechanical Engineering ◽

10.1155/2014/830402 ◽

2014 ◽

Vol 6 ◽

pp. 830402 ◽

Cited By ~ 3

Author(s):

Changhao Piao ◽

Zhi Huang ◽

Ling Su ◽

Sheng Lu

Keyword(s):

Outlier Detection ◽

Data Stream ◽

Concept Drift ◽

Detection Algorithm ◽

High Dimensional ◽

Small Scale ◽

Data Sets ◽

Angle Distribution ◽

System Safety ◽

Battery System

Battery system is the key part of the electric vehicle. To realize outlier detection in the running process of battery system effectively, a new high-dimensional data stream outlier detection algorithm (DSOD) based on angle distribution is proposed. First, in order to improve the algorithm stability in high-dimensional space, the method of angle distribution-based outlier detection algorithm is employed. Second, to reduce the computational complexity, a small-scale calculation set of data stream is established, which is composed of normal set and border set. For the purpose of solving the problem of concept drift, an update mechanism for the normal set and border set is developed in this paper. By this way, these hidden abnormal points will be rapidly detected. The experimental results on real data sets and battery system simulation data sets demonstrate that DSOD is more efficient than Simple variance of angles (Simple VOA) and angle-based outlier detection (ABOD) and is very suitable for the evaluation of battery system safety.

Download Full-text

An Improved Outlier Detection Algorithm Based on Reverse K-Nearest Neighbors of Adaptive Parameters

Lecture Notes in Electrical Engineering - Frontier and Future Development of Information Technology in Medicine and Education ◽

10.1007/978-94-007-7618-0_47 ◽

2013 ◽

pp. 477-487

Author(s):

Xie Fangfang ◽

Xu Liancheng ◽

Chi Xuezhi ◽

Zhu Zhenfang

Keyword(s):

Outlier Detection ◽

Nearest Neighbors ◽

Detection Algorithm ◽

K Nearest Neighbors ◽

Adaptive Parameters

Download Full-text

A robust method for inverse transport modeling of atmospheric emissions using blind outlier detection

Geoscientific Model Development ◽

10.5194/gmd-7-2303-2014 ◽

2014 ◽

Vol 7 (5) ◽

pp. 2303-2311 ◽

Cited By ~ 9

Author(s):

M. Martinez-Camara ◽

B. Béjar Haro ◽

A. Stohl ◽

M. Vetterli

Keyword(s):

Outlier Detection ◽

Environmental Concern ◽

Measurement Data ◽

Real Data ◽

Detection Algorithm ◽

Data Sets ◽

Data Set ◽

Heavy Tailed ◽

Improved Performance ◽

Inverse Transport

Abstract. Emissions of harmful substances into the atmosphere are a serious environmental concern. In order to understand and predict their effects, it is necessary to estimate the exact quantity and timing of the emissions from sensor measurements taken at different locations. There are a number of methods for solving this problem. However, these existing methods assume Gaussian additive errors, making them extremely sensitive to outlier measurements. We first show that the errors in real-world measurement data sets come from a heavy-tailed distribution, i.e., include outliers. Hence, we propose robustifying the existing inverse methods by adding a blind outlier-detection algorithm. The improved performance of our method is demonstrated on a real data set and compared to previously proposed methods. For the blind outlier detection, we first use an existing algorithm, RANSAC, and then propose a modification called TRANSAC, which provides a further performance improvement.

Download Full-text

A Mixture Model-Based Combination Approach for Outlier Detection

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213014600215 ◽

2014 ◽

Vol 23 (04) ◽

pp. 1460021 ◽

Cited By ~ 2

Author(s):

Mohamed Bouguessa

Keyword(s):

Outlier Detection ◽

Mixture Model ◽

Real Data ◽

Detection Algorithm ◽

Automatic Identification ◽

Data Sets ◽

Score Vector ◽

Data Object ◽

Multivariate Beta ◽

To Come

In this paper, we propose an approach that combines different outlier detection algorithms in order to gain an improved effectiveness. To this end, we first estimate an outlier score vector for each data object. Each element of the estimated vectors corresponds to an outlier score produced by a specific outlier detection algorithm. We then use the multivariate beta mixture model to cluster the outlier score vectors into several components so that the component that corresponds to the outliers can be identified. A notable feature of the proposed approach is the automatic identification of outliers, while most existing methods return only a ranked list of points, expecting the outliers to come first; or require empirical threshold estimation to identify outliers. Experimental results, on both synthetic and real data sets, show that our approach substantially enhances the accuracy of outlier base detectors considered in the combination and overcome their drawbacks.

Download Full-text

A Novel Drift Detection Algorithm Based on Features’ Importance Analysis in a Data Streams Environment

Journal of Artificial Intelligence and Soft Computing Research ◽

10.2478/jaiscr-2020-0019 ◽

2020 ◽

Vol 10 (4) ◽

pp. 287-298

Author(s):

Piotr Duda ◽

Krzysztof Przybyszewski ◽

Lipo Wang

Keyword(s):

Random Forest ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Ensemble Methods ◽

Real Data ◽

Relevant Information ◽

Detection Algorithm ◽

Important Indicator ◽

Features Importance

AbstractThe training set consists of many features that influence the classifier in different degrees. Choosing the most important features and rejecting those that do not carry relevant information is of great importance to the operating of the learned model. In the case of data streams, the importance of the features may additionally change over time. Such changes affect the performance of the classifier but can also be an important indicator of occurring concept-drift. In this work, we propose a new algorithm for data streams classification, called Random Forest with Features Importance (RFFI), which uses the measure of features importance as a drift detector. The RFFT algorithm implements solutions inspired by the Random Forest algorithm to the data stream scenarios. The proposed algorithm combines the ability of ensemble methods for handling slow changes in a data stream with a new method for detecting concept drift occurrence. The work contains an experimental analysis of the proposed algorithm, carried out on synthetic and real data.

Download Full-text

The Implementation of Subspace Outlier Detection in K-Nearest Neighbors to Improve Accuracy in Bank Marketing Data

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2020/44822020 ◽

2020 ◽

Vol 8 (2) ◽

pp. 545-550

Author(s):

Dimas Aryo Anggoro

Keyword(s):

Outlier Detection ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Improve Accuracy ◽

Marketing Data ◽

Bank Marketing

Download Full-text

Identifying buzz in social media: a hybrid approach using artificial bee colony and k-nearest neighbors for outlier detection

Social Network Analysis and Mining ◽

10.1007/s13278-017-0461-2 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 16

Author(s):

Reema Aswani ◽

S. P. Ghrera ◽

Arpan Kumar Kar ◽

Satish Chandra

Keyword(s):

Social Media ◽

Outlier Detection ◽

Artificial Bee Colony ◽

Hybrid Approach ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Bee Colony

Download Full-text

A Reformed K-Nearest Neighbors Algorithm for Big Data Sets

Journal of Computer Science ◽

10.3844/jcssp.2018.1213.1225 ◽

2018 ◽

Vol 14 (9) ◽

pp. 1213-1225 ◽

Cited By ~ 2

Author(s):

Vo Ngoc Phu ◽

Vo Thi Ngoc Tran

Keyword(s):

Big Data ◽

Nearest Neighbors ◽

Data Sets ◽

K Nearest Neighbors

Download Full-text

A Bi-directional Fuzzy C-Means Clustering Ensemble Algorithm Considering Local Information

International Journal of Computational Intelligence Systems ◽

10.1007/s44196-021-00014-z ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Chunhua Ren ◽

Linfu Sun

Keyword(s):

Clustering Algorithms ◽

Real Data ◽

Local Information ◽

Data Sets ◽

Clustering Ensemble ◽

K Nearest Neighbors ◽

Fuzzy C Means ◽

Clustering Quality ◽

Fuzzy C Means Clustering ◽

Fcm Clustering

AbstractThe classic Fuzzy C-means (FCM) algorithm has limited clustering performance and is prone to misclassification of border points. This study offers a bi-directional FCM clustering ensemble approach that takes local information into account (LI_BIFCM) to overcome these challenges and increase clustering quality. First, various membership matrices are created after running FCM multiple times, based on the randomization of the initial cluster centers, and a vertical ensemble is performed using the maximum membership principle. Second, after each execution of FCM, multiple local membership matrices of the sample points are created using multiple K-nearest neighbors, and a horizontal ensemble is performed. Multiple horizontal ensembles can be created using multiple FCM clustering. Finally, the final clustering results are obtained by combining the vertical and horizontal clustering ensembles. Twelve data sets were chosen for testing from both synthetic and real data sources. The LI_BIFCM clustering performance outperformed four traditional clustering algorithms and three clustering ensemble algorithms in the experiments. Furthermore, the final clustering results has a weak correlation with the bi-directional cluster ensemble parameters, indicating that the suggested technique is robust.

Download Full-text

A Data Stream Outlier Delection Algorithm Based on Reverse K Nearest Neighbors

2010 International Symposium on Computational Intelligence and Design ◽

10.1109/iscid.2010.149 ◽

2010 ◽

Cited By ~ 2

Author(s):

Cao Lijun ◽

Liu Xiyin ◽

Zhou Tiejun ◽

Zhang Zhongping ◽

Liu Aiyong

Keyword(s):

Data Stream ◽

Nearest Neighbors ◽

K Nearest Neighbors

Download Full-text