A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams

Outlier detection is a statistical procedure that aims to find suspicious events or items that are different from the normal form of a dataset. It has drawn considerable interest in the field of data mining and machine learning. Outlier detection is important in many applications, including fraud detection in credit card transactions and network intrusion detection. There are two general types of outlier detection: global and local. Global outliers fall outside the normal range for an entire dataset, whereas local outliers may fall within the normal range for the entire dataset, but outside the normal range for the surrounding data points. This paper addresses local outlier detection. The best-known technique for local outlier detection is the Local Outlier Factor (LOF), a density-based technique. There are many LOF algorithms for a static data environment; however, these algorithms cannot be applied directly to data streams, which are an important type of big data. In general, local outlier detection algorithms for data streams are still deficient and better algorithms need to be developed that can effectively analyze the high velocity of data streams to detect local outliers. This paper presents a literature review of local outlier detection algorithms in static and stream environments, with an emphasis on LOF algorithms. It collects and categorizes existing local outlier detection algorithms and analyzes their characteristics. Furthermore, the paper discusses the advantages and limitations of those algorithms and proposes several promising directions for developing improved local outlier detection methods for data streams.

Download Full-text

Genetic-based Summarization for Local Outlier Detection in Data Stream

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2021.01.05 ◽

2021 ◽

Vol 13 (1) ◽

pp. 58-68

Author(s):

Mohamed Sakr ◽

◽

Walid Atwa ◽

Arabi Keshk

Keyword(s):

Outlier Detection ◽

Data Streams ◽

Approximate Solutions ◽

Streaming Data ◽

Detection Algorithms ◽

Processing Power ◽

Static Data ◽

Large Memory ◽

Two Phases ◽

Local Outlier

Outlier detection is one of the important tasks in data mining. Detecting outliers over streaming data has become an important task in many applications, such as network analysis, fraud detections, and environment monitoring. One of the well-known outlier detection algorithms called Local Outlier Factor (LOF). However, the original LOF has many drawbacks that can’t be used with data streams: 1- it needs a lot of processing power (CPU) and large memory to detect the outliers. 2- it deals with static data which mean that in any change in data the LOF recalculates the outliers from the beginning on the whole data. These drawbacks make big challenges for existing outlier detection algorithms in terms of their accuracies when they are implemented in the streaming environment. In this paper, we propose a new algorithm called GSILOF that focuses on detecting outliers from data streams using genetics. GSILOF solve the problem of large memory needed as it has fixed memory bound. GSILOF has two phases. First, the summarization phase that tries to summarize the past data arrived. Second, the detection phase detects the outliers from the new arriving data. The summarization phase uses a genetic algorithm to try to find the subset of points that can represent the whole original set. our experiments have been done over real datasets. Our experiments confirming the effectiveness of the proposed approach and the high quality of approximate solutions in a set of real-world streaming data.

Download Full-text

Dynamic graph embedding for outlier detection on multiple meteorological time series

PLoS ONE ◽

10.1371/journal.pone.0247119 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0247119

Author(s):

Gen Li ◽

Jason J. Jung

Keyword(s):

Time Series ◽

Outlier Detection ◽

Meteorological Data ◽

Graph Embedding ◽

Detection Methods ◽

Dynamic Graph ◽

Local Outlier Factor ◽

Box Plot ◽

Local Outlier ◽

Isolation Forest

Existing dynamic graph embedding-based outlier detection methods mainly focus on the evolution of graphs and ignore the similarities among them. To overcome this limitation for the effective detection of abnormal climatic events from meteorological time series, we proposed a dynamic graph embedding model based on graph proximity, called DynGPE. Climatic events are represented as a graph where each vertex indicates meteorological data and each edge indicates a spurious relationship between two meteorological time series that are not causally related. The graph proximity is described as the distance between two graphs. DynGPE can cluster similar climatic events in the embedding space. Abnormal climatic events are distant from most of the other events and can be detected using outlier detection methods. We conducted experiments by applying three outlier detection methods (i.e., isolation forest, local outlier factor, and box plot) to real meteorological data. The results showed that DynGPE achieves better results than the baseline by 44.3% on average in terms of the F-measure. Isolation forest provides the best performance and stability. It achieved higher results than the local outlier factor and box plot methods, namely, by 15.4% and 78.9% on average, respectively.

Download Full-text

Unsupervised Feature Selection for Outlier Detection on Streaming Data to Enhance Network Security

Applied Sciences ◽

10.3390/app112412073 ◽

2021 ◽

Vol 11 (24) ◽

pp. 12073

Author(s):

Michael Heigl ◽

Enrico Weigelt ◽

Dalibor Fiala ◽

Martin Schramm

Keyword(s):

Feature Selection ◽

Outlier Detection ◽

Data Streams ◽

State Of The Art ◽

Streaming Data ◽

Detection Methods ◽

Unsupervised Feature Selection ◽

Detection Algorithms ◽

Efficient Detection ◽

Selection For

Over the past couple of years, machine learning methods—especially the outlier detection ones—have anchored in the cybersecurity field to detect network-based anomalies rooted in novel attack patterns. However, the ubiquity of massive continuously generated data streams poses an enormous challenge to efficient detection schemes and demands fast, memory-constrained online algorithms that are capable to deal with concept drifts. Feature selection plays an important role when it comes to improve outlier detection in terms of identifying noisy data that contain irrelevant or redundant features. State-of-the-art work either focuses on unsupervised feature selection for data streams or (offline) outlier detection. Substantial requirements to combine both fields are derived and compared with existing approaches. The comprehensive review reveals a research gap in unsupervised feature selection for the improvement of outlier detection methods in data streams. Thus, a novel algorithm for Unsupervised Feature Selection for Streaming Outlier Detection, denoted as UFSSOD, will be proposed, which is able to perform unsupervised feature selection for the purpose of outlier detection on streaming data. Furthermore, it is able to determine the amount of top-performing features by clustering their score values. A generic concept that shows two application scenarios of UFSSOD in conjunction with off-the-shell online outlier detection algorithms has been derived. Extensive experiments have shown that a promising feature selection mechanism for streaming data is not applicable in the field of outlier detection. Moreover, UFSSOD, as an online capable algorithm, yields comparable results to a state-of-the-art offline method trimmed for outlier detection.

Download Full-text

Multivariate Anomaly Detection for Earth Observations: A Comparison of Algorithms and Feature Extraction Techniques

10.5194/esd-2016-51 ◽

2016 ◽

Cited By ~ 1

Author(s):

Milan Flach ◽

Fabian Gans ◽

Alexander Brenning ◽

Joachim Denzler ◽

Markus Reichstein ◽

...

Keyword(s):

Feature Extraction ◽

Anomaly Detection ◽

Data Streams ◽

Multivariate Data ◽

Detection Methods ◽

Earth System ◽

Earth System Science ◽

System Science ◽

Detection Algorithms ◽

Earth Observations

Abstract. Today, many processes at the Earth's surface are constantly monitored by multiple data streams. These observations have become central to advance our understanding of e.g. vegetation dynamics in response to climate or land use change. Another set of important applications is monitoring effects of climatic extreme events, other disturbances such as fires, or abrupt land transitions. One important methodological question is how to reliably detect anomalies in an automated and generic way within multivariate data streams, which typically vary seasonally and are interconnected across variables. Although many algorithms have been proposed for detecting anomalies in multivariate data, only few have been investigated in the context of Earth system science applications. In this study, we systematically combine and compare feature extraction and anomaly detection algorithms for detecting anomalous events. Our aim is to identify suitable workflows for automatically detecting anomalous patterns in multivariate Earth system data streams. We rely on artificial data that mimic typical properties and anomalies in multivariate spatiotemporal Earth observations. This artificial experiment is needed as there is no 'gold standard' for the identification of anomalies in real Earth observations. Our results show that a well chosen feature extraction step (e.g. subtracting seasonal cycles, or dimensionality reduction) is more important than the choice of a particular anomaly detection algorithm. Nevertheless, we identify 3 detection algorithms (k-nearest neighbours mean distance, kernel density estimation, a recurrence approach) and their combinations (ensembles) that outperform other multivariate approaches as well as univariate extreme event detection methods. Our results therefore provide an effective workflow to automatically detect anomalies in Earth system science data.

Download Full-text

Scalable KDE-based top-n local outlier detection over large-scale data streams

Knowledge-Based Systems ◽

10.1016/j.knosys.2020.106186 ◽

2020 ◽

Vol 204 ◽

pp. 106186 ◽

Cited By ~ 2

Author(s):

Fang Liu ◽

Yanwei Yu ◽

Peng Song ◽

Yangyang Fan ◽

Xiangrong Tong

Keyword(s):

Outlier Detection ◽

Data Streams ◽

Large Scale ◽

Large Scale Data ◽

Scale Data ◽

Local Outlier

Download Full-text

A Novel Framework for Context-aware Outlier Detection in Big Data Streams

Journal of Digital Information Management ◽

10.6025/jdim/2018/16/5/213-222 ◽

2018 ◽

Vol 16 (5) ◽

pp. 213 ◽

Cited By ~ 1

Author(s):

Hussien Ahmad ◽

Salah Dowaji

Keyword(s):

Big Data ◽

Outlier Detection ◽

Data Streams ◽

Context Aware ◽

Big Data Streams

Download Full-text

Recognition of Position Group Relationship in Dynamic Social Networks

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l2510.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 118-121

Keyword(s):

News Media ◽

Online Social Network ◽

Research Area ◽

Detection Methods ◽

Support Vector ◽

Vector Method ◽

Detection Algorithms ◽

Network Intrusion ◽

Dynamic Social Networks ◽

Network Anomaly Detection

Mental stress is turning into a threat to people's health currently days. With the last step of life, a lot of and a lot of folks are feeling stressed. A novel hybrid model combined with Convolution Neural Network (CNN) to control tweet content and social interaction information for stress detection effectively. Network anomaly detection is an important and dynamic research area. Many network intrusion detection methods and systems (NIDS) have been proposed in the literature. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ineffective or not applicable. Based on the information that is provided by the online social network, the conditions are limited. This method can opinion investigation of Facebook post after Formation of point utilizing Support Vector Method (SVM). After grouping client is in pressure or not k-closest neighbor calculation (KNN) is utilized for proposal emergency clinic on a guide just as Admin can send letters of precautionary measure list for the client for end up solid and upbeat throughout everyday life

Download Full-text

An efficient local outlier detection optimized by rough clustering

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211433 ◽

2021 ◽

pp. 1-12

Author(s):

Chunyan She ◽

Shaohua Zeng

Keyword(s):

Outlier Detection ◽

False Positive Rate ◽

Time Cost ◽

Local Distribution ◽

Running Speed ◽

Local Outlier Factor ◽

Real World Applications ◽

Positive Rate ◽

Nearest Neighborhood ◽

Local Outlier

Outlier detection is a hot issue in data mining, which has plenty of real-world applications. LOF (Local Outlier Factor) can capture the abnormal degree of objects in the dataset with different density levels, and many extended algorithms have been proposed in recent years. However, the LOF needs to search the nearest neighborhood of each object on the whole dataset, which greatly increases the time cost. Most of these extended algorithms only consider the distance between an object and its neighborhood, but ignore the local distribution of an object within its neighborhood, resulting in a high false-positive rate. To improve the running speed, a rough clustering based on triple fusion is proposed, which divides a dataset into several subsets and outlier detection is performed only on each subset. Then, considering the local distribution of an object within its neighborhood, a new local outlier factor is constructed to estimate the abnormal degree of each object. Finally, the experimental results indicate that the proposed algorithm has better performance and lower running time than the others.

Download Full-text

Outlier Detection Methods for Uncovering of Critical Events in Historical Phasor Measurement Records

E3S Web of Conferences ◽

10.1051/e3sconf/20186408006 ◽

2018 ◽

Vol 64 ◽

pp. 08006 ◽

Cited By ~ 1

Author(s):

Kummerow André ◽

Nicolai Steffen ◽

Bretschneider Peter

Keyword(s):

Power Systems ◽

Outlier Detection ◽

Training Data ◽

Detection Methods ◽

Data Sets ◽

Critical Events ◽

Failure Patterns ◽

Detection Algorithms ◽

Reduction Techniques ◽

Dimension Reduction Techniques

The scope of this survey is the uncovering of potential critical events from mixed PMU data sets. An unsupervised procedure is introduced with the use of different outlier detection methods. For that, different techniques for signal analysis are used to generate features in time and frequency domain as well as linear and non-linear dimension reduction techniques. That approach enables the exploration of critical grid dynamics in power systems without prior knowledge about existing failure patterns. Furthermore new failure patterns can be extracted for the creation of training data sets used for online detection algorithms.

Download Full-text

A Fast and Efficient Local Outlier Detection in Data Streams

Proceedings of the 2019 International Conference on Image, Video and Signal Processing - IVSP 2019 ◽

10.1145/3317640.3317653 ◽

2019 ◽

Author(s):

Xing Yang ◽

Wenli Zhou ◽

Nanfei Shu ◽

Hao Zhang

Keyword(s):

Outlier Detection ◽

Data Streams ◽

Local Outlier

Download Full-text