Outlier Detection Algorithm Basing on Similarity Measurement Relation

Outlier detection is an important field of data mining, which is widely used in credit card fraud detection, network intrusion detection ,etc. A kind of high dimensional data similarity metric function and the concept of class density are given in the paper, basing on the combination of hierarchical clustering and similarity, as well as outlier detection algorithm about similarity measurement is presented after the redefinition of high dimension density outliers is put. The algorithm has some value for outliers detection of high dimensional data set in view of experimental result.

Download Full-text

Outlier Detection in the Framework of Dimensionality Reduction

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001415500172 ◽

2015 ◽

Vol 29 (04) ◽

pp. 1550017 ◽

Cited By ~ 3

Author(s):

Qiang Ye ◽

Weifeng Zhi

Keyword(s):

Dimensionality Reduction ◽

Outlier Detection ◽

Nonlinear Models ◽

High Dimensional Data ◽

Detection Algorithm ◽

High Dimensional ◽

Dimensional Manifold ◽

Data Set ◽

Manifold Models ◽

Low Dimensional

We propose an effective outlier detection algorithm for high-dimensional data. We consider manifold models of data as is typically assumed in dimensionality reduction/manifold learning. Namely, we consider a noisy data set sampled from a low-dimensional manifold in a high-dimensional data space. Our algorithm uses local geometric structure to determine inliers, from which the outliers are identified. The algorithm is applicable to both linear and nonlinear models of data. We also discuss various implementation issues and we present several examples to demonstrate the effectiveness of the new approach.

Download Full-text

A Novel Density-based Technique for Outlier Detection of High Dimensional Data Utilizing Full Feature Space

Information Technology And Control ◽

10.5755/j01.itc.50.1.25588 ◽

2021 ◽

Vol 50 (1) ◽

pp. 138-152

Author(s):

Mujeeb Ur Rehman ◽

Dost Muhammad Khan

Keyword(s):

Data Mining ◽

Outlier Detection ◽

High Dimensional Data ◽

Research Work ◽

Feature Space ◽

High Dimensional ◽

Data Set ◽

Data Points ◽

Low Dimensional ◽

Intrinsic Feature

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.

Download Full-text

A System for Outlier Detection of High Dimensional Data

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2012.1037 ◽

2012 ◽

pp. 197-201

Author(s):

Bharat Gupta ◽

Durga Toshniwal

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

Research Problem ◽

High Dimensional ◽

Full Data ◽

Data Set ◽

Detection Techniques ◽

New Concepts ◽

Low Dimensional ◽

Important Research Problem

In high dimensional data large no of outliers are embedded in low dimensional subspaces known as projected outliers, but most of existing outlier detection techniques are unable to find these projected outliers, because these methods perform detection of abnormal patterns in full data space. So, outlier detection in high dimensional data becomes an important research problem. In this paper we are proposing an approach for outlier detection of high dimensional data. Here we are modifying the existing SPOT approach by adding three new concepts namely Adaption of Sparse Sub-Space Template (SST), Different combination of PCS parameters and set of non outlying cells for testing data set.

Download Full-text

Detecting Outliers in High Dimensional Data Sets using Z-Score Methodology

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a3910.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 48-53

Keyword(s):

Outlier Detection ◽

Credit Card ◽

High Dimensional Data ◽

Research Area ◽

High Dimensional ◽

Data Sets ◽

Z Score ◽

Wide Range ◽

Efficiency And Effectiveness ◽

Projected Methods

Outlier detection is an interesting research area in machine learning. With the recently emergent tools and varied applications, the attention of outlier recognition is growing significantly. Recently, a significant number of outlier detection approaches have been observed and effectively applied in a wide range of fields, comprising medical health, credit card fraud and intrusion detection. They can be utilized for conservative data analysis. However, Outlier recognition aims to discover sequence in data that do not conform to estimated performance. In this paper, we presented a statistical approach called Z-score method for outlier recognition in high-dimensional data. Z-scores is a novel method for deciding distant data based on data positions on charts. The projected method is computationally fast and robust to outliers’ recognition. A comparative Analysis with extant methods is implemented with high dimensional datasets. Exploratory outcomes determines an enhanced accomplishment, efficiency and effectiveness of our projected methods.

Download Full-text

An Efficient Method to Detect Outliers in High Dimensional Data

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8274 ◽

2019 ◽

Vol 16 (9) ◽

pp. 3938-3944

Author(s):

Atul Garg ◽

Kamaljeet Kaur

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

Detection Algorithm ◽

High Dimensional ◽

The Novel ◽

Linear Discriminant ◽

Detection Techniques ◽

Detection Algorithms ◽

Large Sets ◽

Detection Of Outliers

In this era, detection of outliers or anomalies from high dimensional data is really a great challenge. Normal data is distinguished from data containing anomalies using Outlier detection techniques which classifies new data as normal or abnormal. Different Outlier Detection algorithms are proposed by many researchers for high dimensional data and each algorithm has its own benefits and limitations. In the literature the researchers proposed different algorithms. For this work few algorithms such as Dice-Coefficient Index (DCI), Mapreduce Function and Linear Discriminant Analysis Algorithm (LDA) are considered. Mapreduce function is used to overcome the problem of large datasets. LDA is basically used in the reduction of the data dimensionality. In the present work a novel Hybrid Outlier Detection Algorithm (HbODA) is proposed for efficiently detection of outliers in high dimensional data. The important parameters efficiency, accuracy, computation cost, precision, recall etc. are focused for analyzing the performance of the novel hybrid algorithm. Experimental results on real large sets show that the proposed algorithm is better in detecting outliers than other traditional methods.

Download Full-text

A New Outlier Detection Algorithms Based on Markov Chain

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.366.456 ◽

2011 ◽

Vol 366 ◽

pp. 456-459 ◽

Cited By ~ 3

Author(s):

Jun Yang ◽

Ying Long Wang

Keyword(s):

Markov Chain ◽

Outlier Detection ◽

High Dimensional Data ◽

Weighted Graph ◽

Real Data ◽

Curse Of Dimensionality ◽

High Dimensional ◽

Large Set ◽

Data Set ◽

Novel Approach

Detecting outliers in a large set of data objects is a major data mining task aiming at finding different mechanisms responsible for different groups of objects in a data set. In high-dimensional data, these approaches are bound to deteriorate due to the notorious “curse of dimensionality”. In this paper, we propose a novel approach named ODMC (Outlier Detection Based On Markov Chain)，the effects of the “curse of dimensionality” are alleviated compared to purely distance-based approaches. A main advantage of our new approach is that our method is to use a major feature of an undirected weighted graph to calculate the outlier degree of each node, In a thorough experimental evaluation, we compare ODMC to the ABOD and FindFPOF for various artificial and real data set and show ODMC to perform especially well on high-dimensional data.

Download Full-text

Outlier Detection in High Dimensional Data

Journal of Information & Knowledge Management ◽

10.1142/s0219649220400134 ◽

2020 ◽

Vol 19 (01) ◽

pp. 2040013 ◽

Cited By ~ 5

Author(s):

Firuz Kamalov ◽

Ho Hon Leung

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

Real Life ◽

Principal Component ◽

Original Data ◽

Detection Algorithm ◽

High Dimensional ◽

Detection Algorithms ◽

Real Life Data ◽

Better Than

High-dimensional data poses unique challenges in outlier detection process. Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on dataset of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. In particular, the proposed method outperforms the benchmark methods as measured by [Formula: see text]-score. Our method also produces better-than-average execution times compared with the benchmark methods.

Download Full-text