Combine Value Clustering and Weighted Value Coupling Learning for Outlier Detection in Categorical Data

Author(s):  
Hongzuo Xu ◽  
Yongjun Wang ◽  
Zhiyue Wu ◽  
Xingkong Ma ◽  
Zhiquan Qin
Author(s):  
Hongzuo Xu ◽  
Yongjun Wang ◽  
Zhiyue Wu ◽  
Yijie Wang

Non-IID categorical data is ubiquitous and common in realworld applications. Learning various kinds of couplings has been proved to be a reliable measure when detecting outliers in such non-IID data. However, it is a critical yet challenging problem to model, represent, and utilise high-order complex value couplings. Existing outlier detection methods normally only focus on pairwise primary value couplings and fail to uncover real relations that hide in complex couplings, resulting in suboptimal and unstable performance. This paper introduces a novel unsupervised embedding-based complex value coupling learning framework EMAC and its instance SCAN to address these issues. SCAN first models primary value couplings. Then, coupling bias is defined to capture complex value couplings with different granularities and highlight the essence of outliers. An embedding method is performed on the value network constructed via biased value couplings, which further learns high-order complex value couplings and embeds these couplings into a value representation matrix. Bidirectional selective value coupling learning is proposed to show how to estimate value and object outlierness through value couplings. Substantial experiments show that SCAN (i) significantly outperforms five state-of-the-art outlier detection methods on thirteen real-world datasets; and (ii) has much better resilience to noise than its competitors.


2010 ◽  
Vol 29 (3) ◽  
pp. 697-725 ◽  
Author(s):  
Anna Koufakou ◽  
Jimmy Secretan ◽  
Michael Georgiopoulos

2014 ◽  
Vol 67 ◽  
pp. 90-99 ◽  
Author(s):  
Hao-Ting Pai ◽  
Fan Wu ◽  
Pei-Yun S. (Sabrina) Hsueh

2020 ◽  
Vol 9 (3) ◽  
pp. 100-117
Author(s):  
Sangeetha T. ◽  
Geetha Mary A.

The process of recognizing patterns, collecting knowledge from massive databases is called data mining. An object which does not obey and deviates from other objects by their characteristics or behavior are known as outliers. Research works carried out so far on outlier detection were focused only on numerical data, categorical data, and in single universal sets. The main goal of this article is to detect outliers significant in two universal sets by applying the intuitionistic fuzzy cut relationship based on membership and non-membership values. The proposed method, weighted density outlier detection, is based on rough entropy, and is employed to detect outliers. Since it is unsupervised, without considering class labels of decision attributes, weighted density values for all conditional attributes and objects are calculated to detect outliers. For experimental analysis, the Iris dataset from the UCI repository is taken to detect outliers, and comparisons have been made with existing algorithms to prove its efficiency.


2018 ◽  
Vol 27 (03) ◽  
pp. 1850005 ◽  
Author(s):  
Hafiz Asif ◽  
Tanay Talukdar ◽  
Jaideep Vaidya ◽  
Basit Shafiq ◽  
Nabil Adam

Outlier detection is one of the most important data analytics tasks and is used in numerous applications and domains. The goal of outlier detection is to find abnormal entities that are significantly different from the remaining data. Often, the underlying data is distributed across different organizations. If outlier detection is done locally, the results obtained are not as accurate as when outlier detection is done collaboratively over the combined data. However, the data cannot be easily integrated into a single database due to privacy and legal concerns. In this paper, we address precisely this problem. We first define privacy in the context of collaborative outlier detection. We then develop a novel method to find outliers from both horizontally partitioned and vertically partitioned categorical data in a privacy-preserving manner. Our method is based on a scalable outlier detection technique that uses attribute value frequencies. We provide an end-to-end privacy guarantee by using the differential privacy model and secure multiparty computation techniques. Experiments on real data show that our proposed technique is both effective and efficient.


2019 ◽  
Vol 365 ◽  
pp. 325-335 ◽  
Author(s):  
Li Cheng ◽  
Yijie Wang ◽  
Xingkong Ma

Sign in / Sign up

Export Citation Format

Share Document