An Improved Semisupervised Outlier Detection Algorithm Based on Adaptive Feature Weighted Clustering

There exist already various approaches to outlier detection, in which semisupervised methods achieve encouraging superiority due to the introduction of prior knowledge. In this paper, an adaptive feature weighted clustering-based semisupervised outlier detection strategy is proposed. This method maximizes the membership degree of a labeled normal object to the cluster it belongs to and minimizes the membership degrees of a labeled outlier to all clusters. In consideration of distinct significance of features or components in a dataset in determining an object being an inlier or outlier, each feature is adaptively assigned different weights according to the deviation degrees between this feature of all objects and that of a certain cluster prototype. A series of experiments on a synthetic dataset and several real-world datasets are implemented to verify the effectiveness and efficiency of the proposal.

Download Full-text

Overlapping Community Detection Based on Membership Degree Propagation

Entropy ◽

10.3390/e23010015 ◽

2020 ◽

Vol 23 (1) ◽

pp. 15

Author(s):

Rui Gao ◽

Shoufeng Li ◽

Xiaohu Shi ◽

Yanchun Liang ◽

Dong Xu

Keyword(s):

Complex Network ◽

Community Detection ◽

Real World ◽

Detection Algorithm ◽

Label Propagation ◽

Computational Time ◽

Membership Degree ◽

Overlapping Community Detection ◽

Overlapping Community ◽

Real World Datasets

A community in a complex network refers to a group of nodes that are densely connected internally but with only sparse connections to the outside. Overlapping community structures are ubiquitous in real-world networks, where each node belongs to at least one community. Therefore, overlapping community detection is an important topic in complex network research. This paper proposes an overlapping community detection algorithm based on membership degree propagation that is driven by both global and local information of the node community. In the method, we introduce a concept of membership degree, which not only stores the label information, but also the degrees of the node belonging to the labels. Then the conventional label propagation process could be extended to membership degree propagation, with the results mapped directly to the overlapping community division. Therefore, it obtains the partition result and overlapping node identification simultaneously and greatly reduces the computational time. The proposed algorithm was applied to a synthetic Lancichinetti–Fortunato–Radicchi (LFR) dataset and nine real-world datasets and compared with other up-to-date algorithms. The experimental results show that our proposed algorithm is effective and outperforms the comparison methods on most datasets. Our proposed method significantly improved the accuracy and speed of the overlapping node prediction. It can also substantially alleviate the computational complexity of community structure detection in general.

Download Full-text

OFCOD: On the Fly Clustering Based Outlier Detection Framework

Data ◽

10.3390/data6010001 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Ahmed Elmogy ◽

Hamada Rizk ◽

Amany M. Sarhan

Keyword(s):

Data Mining ◽

Image Processing ◽

Intrusion Detection ◽

Real Time ◽

Outlier Detection ◽

Real World ◽

Medical Data ◽

Experimental Results ◽

Real Time Applications ◽

Real World Datasets

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

Download Full-text

Overlapping Community Detection Based on Attribute Augmented Graph

Entropy ◽

10.3390/e23060680 ◽

2021 ◽

Vol 23 (6) ◽

pp. 680

Author(s):

Hanyang Lin ◽

Yongzhao Zhan ◽

Zizheng Zhao ◽

Yuzhong Chen ◽

Chen Dong

Keyword(s):

Community Detection ◽

Real World ◽

Detection Algorithm ◽

Overlapping Community Detection ◽

Overlapping Communities ◽

Adjustment Strategy ◽

Topology Information ◽

Overlapping Community ◽

Real World Datasets ◽

Community Detection Algorithm

There is a wealth of information in real-world social networks. In addition to the topology information, the vertices or edges of a social network often have attributes, with many of the overlapping vertices belonging to several communities simultaneously. It is challenging to fully utilize the additional attribute information to detect overlapping communities. In this paper, we first propose an overlapping community detection algorithm based on an augmented attribute graph. An improved weight adjustment strategy for attributes is embedded in the algorithm to help detect overlapping communities more accurately. Second, we enhance the algorithm to automatically determine the number of communities by a node-density-based fuzzy k-medoids process. Extensive experiments on both synthetic and real-world datasets demonstrate that the proposed algorithms can effectively detect overlapping communities with fewer parameters compared to the baseline methods.

Download Full-text

Adaptive Double-Exploration Tradeoff for Outlier Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6164 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6837-6844

Author(s):

Xiaojin Zhang ◽

Honglei Zhuang ◽

Shengyu Zhang ◽

Yuan Zhou

Keyword(s):

Confidence Interval ◽

Outlier Detection ◽

Real World ◽

Efficient Algorithm ◽

Experimental Results ◽

Sample Complexity ◽

Bandit Problem ◽

Real World Datasets ◽

Synthetic Datasets ◽

The Individual

We study a variant of the thresholding bandit problem (TBP) in the context of outlier detection, where the objective is to identify the outliers whose rewards are above a threshold. Distinct from the traditional TBP, the threshold is defined as a function of the rewards of all the arms, which is motivated by the criterion for identifying outliers. The learner needs to explore the rewards of the arms as well as the threshold. We refer to this problem as "double exploration for outlier detection". We construct an adaptively updated confidence interval for the threshold, based on the estimated value of the threshold in the previous rounds. Furthermore, by automatically trading off exploring the individual arms and exploring the outlier threshold, we provide an efficient algorithm in terms of the sample complexity. Experimental results on both synthetic datasets and real-world datasets demonstrate the efficiency of our algorithm.

Download Full-text

An Efficient Distance and Density Based Outlier Detection Approach

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.155-156.342 ◽

2012 ◽

Vol 155-156 ◽

pp. 342-347 ◽

Cited By ~ 1

Author(s):

Xun Biao Zhong ◽

Xiao Xia Huang

Keyword(s):

Outlier Detection ◽

Real World ◽

High Performance ◽

Detection Problem ◽

Empirical Results ◽

Detection Approach ◽

Real World Datasets ◽

Good Detection

In order to solve the density based outlier detection problem with low accuracy and high computation, a variance of distance and density (VDD) measure is proposed in this paper. And the k-means clustering and score based VDD (KSVDD) approach proposed can efficiently detect outliers with high performance. For illustration, two real-world datasets are utilized to show the feasibility of the approach. Empirical results show that KSVDD has a good detection precision.

Download Full-text

FairLOF: Fairness in Outlier Detection

Data Science and Engineering ◽

10.1007/s41019-021-00169-x ◽

2021 ◽

Author(s):

Deepak P ◽

Savitha Sam Abraham

Keyword(s):

Marital Status ◽

Outlier Detection ◽

Real World ◽

Detection Method ◽

Empirical Evaluation ◽

Evaluation Framework ◽

Real World Datasets

AbstractAn outlier detection method may be considered fair over specified sensitive attributes if the results of outlier detection are not skewed toward particular groups defined on such sensitive attributes. In this paper, we consider the task of fair outlier detection. Our focus is on the task of fair outlier detection over multiple multi-valued sensitive attributes (e.g., gender, race, religion, nationality and marital status, among others), one that has broad applications across modern data scenarios. We propose a fair outlier detection method, FairLOF, that is inspired by the popular LOF formulation for neighborhood-based outlier detection. We outline ways in which unfairness could be induced within LOF and develop three heuristic principles to enhance fairness, which form the basis of the FairLOF method. Being a novel task, we develop an evaluation framework for fair outlier detection, and use that to benchmark FairLOF on quality and fairness of results. Through an extensive empirical evaluation over real-world datasets, we illustrate that FairLOF is able to achieve significant improvements in fairness at sometimes marginal degradations on result quality as measured against the fairness-agnostic LOF method. We also show that a generalization of our method, named FairLOF-Flex, is able to open possibilities of further deepening fairness in outlier detection beyond what is offered by FairLOF.

Download Full-text

Online Reputation Fraud Campaign Detection in User Ratings

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/541 ◽

2017 ◽

Cited By ~ 2

Author(s):

Chang Xu ◽

Jie Zhang ◽

Zhu Sun

Keyword(s):

Real World ◽

Historical Data ◽

Collective Behaviors ◽

Optimization Framework ◽

Effectiveness And Efficiency ◽

Real World Datasets ◽

Online Reputation ◽

User Ratings

Reputation fraud campaigns (RFCs) distort the reputations of rated items, by generating fake ratings through multiple spammers. One effective way of detecting RFCs is to characterize their collective behaviors based on rating histories.However, these campaigns are constantly evolving and changing tactics to evade detection.For example, they can launch early attacks on the items to quickly dominate the reputations.They can also whitewash themselves through creating new accounts for subsequent attacks.It is thus challenging for existing approaches working on historical data to promptly react to such emerging fraud activities.In this paper, we conduct RFC detection in online fashion, so as to spot campaign activities as early as possible.This leads to a unified and scalable optimization framework, FraudScan, that can adapt to emerging fraud patterns over time.Empirical analysis on two real-world datasets validates the effectiveness and efficiency of the proposed framework.

Download Full-text

Multi-Aspect Embedding for Attribute-Aware Trajectories

Symmetry ◽

10.3390/sym11091149 ◽

2019 ◽

Vol 11 (9) ◽

pp. 1149

Author(s):

Thapana Boonchoo ◽

Xiang Ao ◽

Qing He

Keyword(s):

Real World ◽

Execution Time ◽

State Of The Art ◽

Representation Learning ◽

Learning Approach ◽

Trajectory Data ◽

Trajectory Mining ◽

Trajectory Similarity ◽

Effectiveness And Efficiency ◽

Real World Datasets

Motivated by the proliferation of trajectory data produced by advanced GPS-enabled devices, trajectory is gaining in complexity and beginning to embroil additional attributes beyond simply the coordinates. As a consequence, this creates the potential to define the similarity between two attribute-aware trajectories. However, most existing trajectory similarity approaches focus only on location based proximities and fail to capture the semantic similarities encompassed by these additional asymmetric attributes (aspects) of trajectories. In this paper, we propose multi-aspect embedding for attribute-aware trajectories (MAEAT), a representation learning approach for trajectories that simultaneously models the similarities according to their multiple aspects. MAEAT is built upon a sentence embedding algorithm and directly learns whole trajectory embedding via predicting the context aspect tokens when given a trajectory. Two kinds of token generation methods are proposed to extract multiple aspects from the raw trajectories, and a regularization is devised to control the importance among aspects. Extensive experiments on the benchmark and real-world datasets show the effectiveness and efficiency of the proposed MAEAT compared to the state-of-the-art and baseline methods. The results of MAEAT can well support representative downstream trajectory mining and management tasks, and the algorithm outperforms other compared methods in execution time by at least two orders of magnitude.

Download Full-text

A Dynamic and Static Context-Aware Attention Network for Trajectory Prediction

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10050336 ◽

2021 ◽

Vol 10 (5) ◽

pp. 336

Author(s):

Jian Yu ◽

Meng Zhou ◽

Xin Wang ◽

Guoliang Pu ◽

Chengqi Cheng ◽

...

Keyword(s):

Real World ◽

Autonomous Driving ◽

Attention Mechanism ◽

Context Aware ◽

Trajectory Prediction ◽

Attention Network ◽

Series Of Experiments ◽

Real World Datasets ◽

The Moment ◽

Autonomous Driving System

Forecasting the motion of surrounding vehicles is necessary for an autonomous driving system applied in complex traffic. Trajectory prediction helps vehicles make more sensible decisions, which provides vehicles with foresight. However, traditional models consider the trajectory prediction as a simple sequence prediction task. The ignorance of inter-vehicle interaction and environment influence degrades these models in real-world datasets. To address this issue, we propose a novel Dynamic and Static Context-aware Attention Network named DSCAN in this paper. The DSCAN utilizes an attention mechanism to dynamically decide which surrounding vehicles are more important at the moment. We also equip the DSCAN with a constraint network to consider the static environment information. We conducted a series of experiments on a real-world dataset, and the experimental results demonstrated the effectiveness of our model. Moreover, the present study suggests that the attention mechanism and static constraints enhance the prediction results.

Download Full-text

Selective oversampling approach for strongly imbalanced data

PeerJ Computer Science ◽

10.7717/peerj-cs.604 ◽

2021 ◽

Vol 7 ◽

pp. e604

Author(s):

Peter Gnip ◽

Liberios Vokorokos ◽

Peter Drotár

Keyword(s):

Outlier Detection ◽

Real World ◽

State Of The Art ◽

Imbalanced Data ◽

Prediction Performance ◽

Classifier Performance ◽

Real World Applications ◽

Real World Datasets ◽

Synthetic Datasets ◽

Representative Samples

Challenges posed by imbalanced data are encountered in many real-world applications. One of the possible approaches to improve the classifier performance on imbalanced data is oversampling. In this paper, we propose the new selective oversampling approach (SOA) that first isolates the most representative samples from minority classes by using an outlier detection technique and then utilizes these samples for synthetic oversampling. We show that the proposed approach improves the performance of two state-of-the-art oversampling methods, namely, the synthetic minority oversampling technique and adaptive synthetic sampling. The prediction performance is evaluated on four synthetic datasets and four real-world datasets, and the proposed SOA methods always achieved the same or better performance than other considered existing oversampling methods.

Download Full-text