Vibration-Based Outlier Detection on High Dimensional Data

Outlier detection is a difficult problem due to its time complexity being quadratic or cube in most cases, which makes it necessary to develop corresponding acceleration algorithms. Since the index structure (c.f. R tree) is used in the main acceleration algorithms, those approaches deteriorate when the dimensionality increases. In this paper, an approach named VBOD (vibration-based outlier detection) is proposed, in which the main variants assess the vibration. Since the basic model and approximation algorithm FASTVBOD do not need to compute the index structure, their performances are less sensitive to increasing dimensions than traditional approaches. The basic model of this approach has only quadratic time complexity. Furthermore, accelerated algorithms decrease time complexity to [Formula: see text]. The fact that this approach does not rely on any parameter selection is another advantage. FASTVBOD was compared with other state-of-the-art algorithms, and it performed much better than other methods especially on high dimensional data.

Download Full-text

Outlier Detection in High Dimensional Data

Journal of Information & Knowledge Management ◽

10.1142/s0219649220400134 ◽

2020 ◽

Vol 19 (01) ◽

pp. 2040013 ◽

Cited By ~ 5

Author(s):

Firuz Kamalov ◽

Ho Hon Leung

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

Real Life ◽

Principal Component ◽

Original Data ◽

Detection Algorithm ◽

High Dimensional ◽

Detection Algorithms ◽

Real Life Data ◽

Better Than

High-dimensional data poses unique challenges in outlier detection process. Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on dataset of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. In particular, the proposed method outperforms the benchmark methods as measured by [Formula: see text]-score. Our method also produces better-than-average execution times compared with the benchmark methods.

Download Full-text

Outlier Detection in High Dimensional Data Based on the Anti-Hub and Regression Technique

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2017.8219 ◽

2017 ◽

Vol V (VIII) ◽

pp. 1543-1551

Author(s):

Golla Hemalatha

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

Regression Technique ◽

High Dimensional

Download Full-text

Subspace Outlier Detection in High Dimensional Data using Ensemble of PCA-based Subspaces

2021 26th International Computer Conference, Computer Society of Iran (CSICC) ◽

10.1109/csicc52343.2021.9420589 ◽

2021 ◽

Author(s):

Mahboobeh Riahi-Madvar ◽

Babak Nasersharif ◽

Ahmad Akbari Azirani

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

An ensemble framework based outlier detection system in high dimensional data using Tree Technique

Materials Today Proceedings ◽

10.1016/j.matpr.2020.11.491 ◽

2021 ◽

Author(s):

N. Jayanthi ◽

Burra Vijaya Babu ◽

N. Sambasiva Rao

Keyword(s):

Outlier Detection ◽

Detection System ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

Outlier detection for high dimensional data

Proceedings of the 2001 ACM SIGMOD international conference on Management of data - SIGMOD '01 ◽

10.1145/375663.375668 ◽

2001 ◽

Cited By ~ 295

Author(s):

Charu C. Aggarwal ◽

Philip S. Yu

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

Development for modification of Torgerson projection method using cumulative curve analysis in outlier detection problem for high-dimensional data

Вычислительные технологии ◽

10.25743/ict.2020.25.3.013 ◽

2020 ◽

pp. 119-129

Author(s):

Никита Сергеевич Олейник ◽

Владислав Юрьевич Щеколдин

Keyword(s):

Multidimensional Scaling ◽

Outlier Detection ◽

High Dimensional Data ◽

Quality Data ◽

Multidimensional Data ◽

High Dimensional ◽

Detection Problem ◽

Gravity Center ◽

Largest Eigenvalues ◽

Cumulative Curves

Рассмотрена задача выявления аномальных наблюдений в данных больших размерностей на основе метода многомерного шкалирования с учетом возможности построения качественной визуализации данных. Предложен алгоритм модифицированного метода главных проекций Торгерсона, основанный на построении подпространства проектирования исходных данных путем изменения способа факторизации матрицы скалярных произведений при помощи метода анализа кумулятивных кривых. Построено и проанализировано эмпирическое распределение F -меры для разных вариантов проектирования исходных данных Purpose. Purpose of the article. The paper aims at the development of methods for multidimensional data presentation for solving classification problems based on the cumulative curves analysis. The paper considers the outlier detection problem for high-dimensional data based on the multidimensional scaling, in order to construct high-quality data visualization. An abnormal observation (or outlier), according to D. Hawkins, is an observation that is so different from others that it may be assumed as appeared in the sample in a fundamentally different way. Methods. One of the conceptual approaches that allow providing the classification of sample observations is multidimensional scaling, representing by the classical Orlochi method, the Torgerson main projections and others. The Torgerson method assumes that when converting data to construct the most convenient classification, the origin must be placed at the gravity center of the analyzed data, after which the matrix of scalar products of vectors with the origin at the gravity center is calculated, the two largest eigenvalues and corresponding eigenvectors are chosen and projection matrix is evaluated. Moreover, the method assumes the linear partitioning of regular and anomalous observations, which arises rarely. Therefore, it is logical to choose among the possible axes for designing those that allow obtaining more effective results for solving the problem of detecting outlier observations. A procedure of modified CC-ABOD (Cumulative Curves for Angle Based Outlier Detection) to estimate the visualization quality has been applied. It is based on the estimation of the variances of angles assumed by particular observation and remaining observations in multidimensional space. Further the cumulative curves analysis is implemented, which allows partitioning out groups of closely localized observations (in accordance with the chosen metric) and form classes of regular, intermediate, and anomalous observations. Results. A proposed modification of the Torgerson method is developed. The F1-measure distribution is constructed and analyzed for different design options in the source data. An analysis of the empirical distribution showed that in a number of cases the best axes are corresponding to the second, third, or even fourth largest eigenvalues. Findings. The multidimensional scaling methods for constructing visualizations of multi-dimensional data and solving problems of outlier detection have been considered. It was found out that the determination of design is an ambiguous problem.

Download Full-text

A Novel Density-based Technique for Outlier Detection of High Dimensional Data Utilizing Full Feature Space

Information Technology And Control ◽

10.5755/j01.itc.50.1.25588 ◽

2021 ◽

Vol 50 (1) ◽

pp. 138-152

Author(s):

Mujeeb Ur Rehman ◽

Dost Muhammad Khan

Keyword(s):

Data Mining ◽

Outlier Detection ◽

High Dimensional Data ◽

Research Work ◽

Feature Space ◽

High Dimensional ◽

Data Set ◽

Data Points ◽

Low Dimensional ◽

Intrinsic Feature

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.

Download Full-text

Example-Based Robust DB-Outlier Detection for High Dimensional Data

Database Systems for Advanced Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-540-78568-2_25 ◽

2008 ◽

pp. 330-347

Author(s):

Yuan Li ◽

Hiroyuki Kitagawa

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

Distance Based Pattern Driven Mining for Outlier Detection in High Dimensional Big Dataset

ACM Transactions on Management Information Systems ◽

10.1145/3469891 ◽

2022 ◽

Vol 13 (1) ◽

pp. 1-17

Author(s):

Ankit Kumar ◽

Abhishek Kumar ◽

Ali Kashif Bashir ◽

Mamoon Rashid ◽

V. D. Ambeth Kumar ◽

...

Keyword(s):

Data Mining ◽

Comparative Analysis ◽

Outlier Detection ◽

Credit Card ◽

High Dimensional ◽

Work Efficiency ◽

Average Value ◽

Novel Method ◽

Detection Of Outliers ◽

Better Than

Detection of outliers or anomalies is one of the vital issues in pattern-driven data mining. Outlier detection detects the inconsistent behavior of individual objects. It is an important sector in the data mining field with several different applications such as detecting credit card fraud, hacking discovery and discovering criminal activities. It is necessary to develop tools used to uncover the critical information established in the extensive data. This paper investigated a novel method for detecting cluster outliers in a multidimensional dataset, capable of identifying the clusters and outliers for datasets containing noise. The proposed method can detect the groups and outliers left by the clustering process, like instant irregular sets of clusters (C) and outliers (O), to boost the results. The results obtained after applying the algorithm to the dataset improved in terms of several parameters. For the comparative analysis, the accurate average value and the recall value parameters are computed. The accurate average value is 74.05% of the existing COID algorithm, and our proposed algorithm has 77.21%. The average recall value is 81.19% and 89.51% of the existing and proposed algorithm, which shows that the proposed work efficiency is better than the existing COID algorithm.

Download Full-text

An Efficient Unsupervised Clustered Adaptive Antihub Technique for Outlier Detection in High Dimensional Data

Indian Journal of Science and Technology ◽

10.17485/ijst/2016/v9i19/93884 ◽

2016 ◽

Vol 9 (19) ◽

Author(s):

R. Lakshmi Devi ◽

R. Amalraj

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

High Dimensional

Download Full-text