scholarly journals Outlier Detection Algorithm Basing on Similarity Measurement Relation

2012 ◽  
Vol 6-7 ◽  
pp. 621-624
Author(s):  
Hong Bin Fang

Outlier detection is an important field of data mining, which is widely used in credit card fraud detection, network intrusion detection ,etc. A kind of high dimensional data similarity metric function and the concept of class density are given in the paper, basing on the combination of hierarchical clustering and similarity, as well as outlier detection algorithm about similarity measurement is presented after the redefinition of high dimension density outliers is put. The algorithm has some value for outliers detection of high dimensional data set in view of experimental result.

Author(s):  
Qiang Ye ◽  
Weifeng Zhi

We propose an effective outlier detection algorithm for high-dimensional data. We consider manifold models of data as is typically assumed in dimensionality reduction/manifold learning. Namely, we consider a noisy data set sampled from a low-dimensional manifold in a high-dimensional data space. Our algorithm uses local geometric structure to determine inliers, from which the outliers are identified. The algorithm is applicable to both linear and nonlinear models of data. We also discuss various implementation issues and we present several examples to demonstrate the effectiveness of the new approach.


2021 ◽  
Vol 50 (1) ◽  
pp. 138-152
Author(s):  
Mujeeb Ur Rehman ◽  
Dost Muhammad Khan

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.


Author(s):  
Bharat Gupta ◽  
Durga Toshniwal

In high dimensional data large no of outliers are embedded in low dimensional subspaces known as projected outliers, but most of existing outlier detection techniques are unable to find these projected outliers, because these methods perform detection of abnormal patterns in full data space. So, outlier detection in high dimensional data becomes an important research problem. In this paper we are proposing an approach for outlier detection of high dimensional data. Here we are modifying the existing SPOT approach by adding three new concepts namely Adaption of Sparse Sub-Space Template (SST), Different combination of PCS parameters and set of non outlying cells for testing data set.


Outlier detection is an interesting research area in machine learning. With the recently emergent tools and varied applications, the attention of outlier recognition is growing significantly. Recently, a significant number of outlier detection approaches have been observed and effectively applied in a wide range of fields, comprising medical health, credit card fraud and intrusion detection. They can be utilized for conservative data analysis. However, Outlier recognition aims to discover sequence in data that do not conform to estimated performance. In this paper, we presented a statistical approach called Z-score method for outlier recognition in high-dimensional data. Z-scores is a novel method for deciding distant data based on data positions on charts. The projected method is computationally fast and robust to outliers’ recognition. A comparative Analysis with extant methods is implemented with high dimensional datasets. Exploratory outcomes determines an enhanced accomplishment, efficiency and effectiveness of our projected methods.


2019 ◽  
Vol 16 (9) ◽  
pp. 3938-3944
Author(s):  
Atul Garg ◽  
Kamaljeet Kaur

In this era, detection of outliers or anomalies from high dimensional data is really a great challenge. Normal data is distinguished from data containing anomalies using Outlier detection techniques which classifies new data as normal or abnormal. Different Outlier Detection algorithms are proposed by many researchers for high dimensional data and each algorithm has its own benefits and limitations. In the literature the researchers proposed different algorithms. For this work few algorithms such as Dice-Coefficient Index (DCI), Mapreduce Function and Linear Discriminant Analysis Algorithm (LDA) are considered. Mapreduce function is used to overcome the problem of large datasets. LDA is basically used in the reduction of the data dimensionality. In the present work a novel Hybrid Outlier Detection Algorithm (HbODA) is proposed for efficiently detection of outliers in high dimensional data. The important parameters efficiency, accuracy, computation cost, precision, recall etc. are focused for analyzing the performance of the novel hybrid algorithm. Experimental results on real large sets show that the proposed algorithm is better in detecting outliers than other traditional methods.


2011 ◽  
Vol 366 ◽  
pp. 456-459 ◽  
Author(s):  
Jun Yang ◽  
Ying Long Wang

Detecting outliers in a large set of data objects is a major data mining task aiming at finding different mechanisms responsible for different groups of objects in a data set. In high-dimensional data, these approaches are bound to deteriorate due to the notorious “curse of dimensionality”. In this paper, we propose a novel approach named ODMC (Outlier Detection Based On Markov Chain),the effects of the “curse of dimensionality” are alleviated compared to purely distance-based approaches. A main advantage of our new approach is that our method is to use a major feature of an undirected weighted graph to calculate the outlier degree of each node, In a thorough experimental evaluation, we compare ODMC to the ABOD and FindFPOF for various artificial and real data set and show ODMC to perform especially well on high-dimensional data.


2020 ◽  
Vol 19 (01) ◽  
pp. 2040013 ◽  
Author(s):  
Firuz Kamalov ◽  
Ho Hon Leung

High-dimensional data poses unique challenges in outlier detection process. Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on dataset of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. In particular, the proposed method outperforms the benchmark methods as measured by [Formula: see text]-score. Our method also produces better-than-average execution times compared with the benchmark methods.


Sign in / Sign up

Export Citation Format

Share Document