An Efficient Distance and Density Based Outlier Detection Approach

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.155-156.342 ◽

2012 ◽

Vol 155-156 ◽

pp. 342-347 ◽

Cited By ~ 1

Author(s):

Xun Biao Zhong ◽

Xiao Xia Huang

Keyword(s):

Outlier Detection ◽

Real World ◽

High Performance ◽

Detection Problem ◽

Empirical Results ◽

Detection Approach ◽

Real World Datasets ◽

Good Detection

In order to solve the density based outlier detection problem with low accuracy and high computation, a variance of distance and density (VDD) measure is proposed in this paper. And the k-means clustering and score based VDD (KSVDD) approach proposed can efficiently detect outliers with high performance. For illustration, two real-world datasets are utilized to show the feasibility of the approach. Empirical results show that KSVDD has a good detection precision.

Download Full-text

OFCOD: On the Fly Clustering Based Outlier Detection Framework

Data ◽

10.3390/data6010001 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Ahmed Elmogy ◽

Hamada Rizk ◽

Amany M. Sarhan

Keyword(s):

Data Mining ◽

Image Processing ◽

Intrusion Detection ◽

Real Time ◽

Outlier Detection ◽

Real World ◽

Medical Data ◽

Experimental Results ◽

Real Time Applications ◽

Real World Datasets

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

Download Full-text

Adaptive Double-Exploration Tradeoff for Outlier Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6164 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6837-6844

Author(s):

Xiaojin Zhang ◽

Honglei Zhuang ◽

Shengyu Zhang ◽

Yuan Zhou

Keyword(s):

Confidence Interval ◽

Outlier Detection ◽

Real World ◽

Efficient Algorithm ◽

Experimental Results ◽

Sample Complexity ◽

Bandit Problem ◽

Real World Datasets ◽

Synthetic Datasets ◽

The Individual

We study a variant of the thresholding bandit problem (TBP) in the context of outlier detection, where the objective is to identify the outliers whose rewards are above a threshold. Distinct from the traditional TBP, the threshold is defined as a function of the rewards of all the arms, which is motivated by the criterion for identifying outliers. The learner needs to explore the rewards of the arms as well as the threshold. We refer to this problem as "double exploration for outlier detection". We construct an adaptively updated confidence interval for the threshold, based on the estimated value of the threshold in the previous rounds. Furthermore, by automatically trading off exploring the individual arms and exploring the outlier threshold, we provide an efficient algorithm in terms of the sample complexity. Experimental results on both synthetic datasets and real-world datasets demonstrate the efficiency of our algorithm.

Download Full-text

FairLOF: Fairness in Outlier Detection

Data Science and Engineering ◽

10.1007/s41019-021-00169-x ◽

2021 ◽

Author(s):

Deepak P ◽

Savitha Sam Abraham

Keyword(s):

Marital Status ◽

Outlier Detection ◽

Real World ◽

Detection Method ◽

Empirical Evaluation ◽

Evaluation Framework ◽

Real World Datasets

AbstractAn outlier detection method may be considered fair over specified sensitive attributes if the results of outlier detection are not skewed toward particular groups defined on such sensitive attributes. In this paper, we consider the task of fair outlier detection. Our focus is on the task of fair outlier detection over multiple multi-valued sensitive attributes (e.g., gender, race, religion, nationality and marital status, among others), one that has broad applications across modern data scenarios. We propose a fair outlier detection method, FairLOF, that is inspired by the popular LOF formulation for neighborhood-based outlier detection. We outline ways in which unfairness could be induced within LOF and develop three heuristic principles to enhance fairness, which form the basis of the FairLOF method. Being a novel task, we develop an evaluation framework for fair outlier detection, and use that to benchmark FairLOF on quality and fairness of results. Through an extensive empirical evaluation over real-world datasets, we illustrate that FairLOF is able to achieve significant improvements in fairness at sometimes marginal degradations on result quality as measured against the fairness-agnostic LOF method. We also show that a generalization of our method, named FairLOF-Flex, is able to open possibilities of further deepening fairness in outlier detection beyond what is offered by FairLOF.

Download Full-text

A Grey-Box Ensemble Model Exploiting Black-Box Accuracy and White-Box Intrinsic Interpretability

Algorithms ◽

10.3390/a13010017 ◽

2020 ◽

Vol 13 (1) ◽

pp. 17 ◽

Cited By ~ 4

Author(s):

Emmanuel Pintelas ◽

Ioannis E. Livieris ◽

Panagiotis Pintelas

Keyword(s):

Machine Learning ◽

Real World ◽

High Performance ◽

Black Box ◽

Box Model ◽

Proposed Model ◽

Wide Range ◽

Critical Issues ◽

Key Factor ◽

Real World Datasets

Machine learning has emerged as a key factor in many technological and scientific advances and applications. Much research has been devoted to developing high performance machine learning models, which are able to make very accurate predictions and decisions on a wide range of applications. Nevertheless, we still seek to understand and explain how these models work and make decisions. Explainability and interpretability in machine learning is a significant issue, since in most of real-world problems it is considered essential to understand and explain the model’s prediction mechanism in order to trust it and make decisions on critical issues. In this study, we developed a Grey-Box model based on semi-supervised methodology utilizing a self-training framework. The main objective of this work is the development of a both interpretable and accurate machine learning model, although this is a complex and challenging task. The proposed model was evaluated on a variety of real world datasets from the crucial application domains of education, finance and medicine. Our results demonstrate the efficiency of the proposed model performing comparable to a Black-Box and considerably outperforming single White-Box models, while at the same time remains as interpretable as a White-Box model.

Download Full-text

An Improved Semisupervised Outlier Detection Algorithm Based on Adaptive Feature Weighted Clustering

Mathematical Problems in Engineering ◽

10.1155/2016/6394253 ◽

2016 ◽

Vol 2016 ◽

pp. 1-14 ◽

Cited By ~ 3

Author(s):

Tingquan Deng ◽

Jinhong Yang

Keyword(s):

Outlier Detection ◽

Real World ◽

Detection Algorithm ◽

Membership Degree ◽

Weighted Clustering ◽

Detection Strategy ◽

Series Of Experiments ◽

Effectiveness And Efficiency ◽

Real World Datasets ◽

Normal Object

There exist already various approaches to outlier detection, in which semisupervised methods achieve encouraging superiority due to the introduction of prior knowledge. In this paper, an adaptive feature weighted clustering-based semisupervised outlier detection strategy is proposed. This method maximizes the membership degree of a labeled normal object to the cluster it belongs to and minimizes the membership degrees of a labeled outlier to all clusters. In consideration of distinct significance of features or components in a dataset in determining an object being an inlier or outlier, each feature is adaptively assigned different weights according to the deviation degrees between this feature of all objects and that of a certain cluster prototype. A series of experiments on a synthetic dataset and several real-world datasets are implemented to verify the effectiveness and efficiency of the proposal.

Download Full-text

Selective oversampling approach for strongly imbalanced data

PeerJ Computer Science ◽

10.7717/peerj-cs.604 ◽

2021 ◽

Vol 7 ◽

pp. e604

Author(s):

Peter Gnip ◽

Liberios Vokorokos ◽

Peter Drotár

Keyword(s):

Outlier Detection ◽

Real World ◽

State Of The Art ◽

Imbalanced Data ◽

Prediction Performance ◽

Classifier Performance ◽

Real World Applications ◽

Real World Datasets ◽

Synthetic Datasets ◽

Representative Samples

Challenges posed by imbalanced data are encountered in many real-world applications. One of the possible approaches to improve the classifier performance on imbalanced data is oversampling. In this paper, we propose the new selective oversampling approach (SOA) that first isolates the most representative samples from minority classes by using an outlier detection technique and then utilizes these samples for synthetic oversampling. We show that the proposed approach improves the performance of two state-of-the-art oversampling methods, namely, the synthetic minority oversampling technique and adaptive synthetic sampling. The prediction performance is evaluated on four synthetic datasets and four real-world datasets, and the proposed SOA methods always achieved the same or better performance than other considered existing oversampling methods.

Download Full-text

A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-642-01307-2_84 ◽

2009 ◽

pp. 813-822 ◽

Cited By ~ 108

Author(s):

Ke Zhang ◽

Marcus Hutter ◽

Huidong Jin

Keyword(s):

Outlier Detection ◽

Real World ◽

Real World Data ◽

World Data ◽

Detection Approach ◽

Local Distance

Download Full-text

Outlier Detection Strategy Using the Self-Organizing Map

Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-252-7.ch012 ◽

2011 ◽

pp. 224-243 ◽

Cited By ~ 3

Author(s):

Fedja Hadzic ◽

Tharam S. Dillon

Keyword(s):

Outlier Detection ◽

Real World ◽

The Self ◽

Continuous Data ◽

Self Organizing Map ◽

Concept Hierarchy ◽

Analysis Strategy ◽

Real World Datasets ◽

Output Space ◽

Self Organizing

Real world datasets are often accompanied with various types of anomalous or exceptional entries which are often referred to as outliers. Detecting outliers and distinguishing noise form true exceptions is important for effective data mining. This chapter presents two methods for outlier detection and analysis using the self-organizing map (SOM), where one is more suitable for categorical and the other for continuous data. They are generally based on filtering out the instances which are not captured by or are contradictory to the obtained concept hierarchy for the domain. We demonstrate how the dimension of the output space plays an important role in the kind of patterns that will be detected as outlying. Furthermore, the concept hierarchy itself provides extra criteria for distinguishing noise from true exceptions. The effectiveness of the proposed outlier detection and analysis strategy is demonstrated through the experiments on publicly available real world datasets.

Download Full-text

On the Improvement of the Isolation Forest Algorithm for Outlier Detection with Streaming Data

Electronics ◽

10.3390/electronics10131534 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1534

Author(s):

Michael Heigl ◽

Kumar Ashutosh Anand ◽

Andreas Urmann ◽

Dalibor Fiala ◽

Martin Schramm ◽

...

Keyword(s):

Outlier Detection ◽

Real World ◽

High Speed ◽

State Of The Art ◽

High Volume ◽

Streaming Data ◽

Steady Increase ◽

Efficient Detection ◽

Real World Datasets ◽

Isolation Forest

In recent years, detecting anomalies in real-world computer networks has become a more and more challenging task due to the steady increase of high-volume, high-speed and high-dimensional streaming data, for which ground truth information is not available. Efficient detection schemes applied on networked embedded devices need to be fast and memory-constrained, and must be capable of dealing with concept drifts when they occur. Different approaches for unsupervised online outlier detection have been designed to deal with these circumstances in order to reliably detect malicious activity. In this paper, we introduce a novel framework called PCB-iForest, which generalized, is able to incorporate any ensemble-based online OD method to function on streaming data. Carefully engineered requirements are compared to the most popular state-of-the-art online methods with an in-depth focus on variants based on the widely accepted isolation forest algorithm, thereby highlighting the lack of a flexible and efficient solution which is satisfied by PCB-iForest. Therefore, we integrate two variants into PCB-iForest—an isolation forest improvement called extended isolation forest and a classic isolation forest variant equipped with the functionality to score features according to their contributions to a sample’s anomalousness. Extensive experiments were performed on 23 different multi-disciplinary and security-related real-world datasets in order to comprehensively evaluate the performance of our implementation compared with off-the-shelf methods. The discussion of results, including AUC, F1 score and averaged execution time metric, shows that PCB-iForest clearly outperformed the state-of-the-art competitors in 61% of cases and even achieved more promising results in terms of the tradeoff between classification and computational costs.

Download Full-text

DEVELOPMENT OF A HIGH PERFORMANCE MULTI-PHYSICS FINITE DIFFERENCE MODEL FOR USE IN A MONTE CARLO SIMULATION WITH REAL WORLD DISTRIBUTIONS

10.1615/tfec2017.cfd.017687 ◽

2017 ◽

Cited By ~ 1

Author(s):

Joseph R. VanderVeer

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Finite Difference ◽

Real World ◽

High Performance ◽

Difference Model ◽

Finite Difference Model

Download Full-text