OFCOD: On the Fly Clustering Based Outlier Detection Framework

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

Download Full-text

Adaptive Double-Exploration Tradeoff for Outlier Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6164 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6837-6844

Author(s):

Xiaojin Zhang ◽

Honglei Zhuang ◽

Shengyu Zhang ◽

Yuan Zhou

Keyword(s):

Confidence Interval ◽

Outlier Detection ◽

Real World ◽

Efficient Algorithm ◽

Experimental Results ◽

Sample Complexity ◽

Bandit Problem ◽

Real World Datasets ◽

Synthetic Datasets ◽

The Individual

We study a variant of the thresholding bandit problem (TBP) in the context of outlier detection, where the objective is to identify the outliers whose rewards are above a threshold. Distinct from the traditional TBP, the threshold is defined as a function of the rewards of all the arms, which is motivated by the criterion for identifying outliers. The learner needs to explore the rewards of the arms as well as the threshold. We refer to this problem as "double exploration for outlier detection". We construct an adaptively updated confidence interval for the threshold, based on the estimated value of the threshold in the previous rounds. Furthermore, by automatically trading off exploring the individual arms and exploring the outlier threshold, we provide an efficient algorithm in terms of the sample complexity. Experimental results on both synthetic datasets and real-world datasets demonstrate the efficiency of our algorithm.

Download Full-text

Semi-Supervised Outlier Detection with Only Positive and Unlabeled Data Based on Fuzzy Clustering

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015500037 ◽

2015 ◽

Vol 24 (03) ◽

pp. 1550003 ◽

Cited By ~ 1

Author(s):

Armin Daneshpazhouh ◽

Ashkan Sami

Keyword(s):

Intrusion Detection ◽

Outlier Detection ◽

Fuzzy Clustering ◽

Real World ◽

State Of The Art ◽

Real Data ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Real World Applications

The task of semi-supervised outlier detection is to find the instances that are exceptional from other data, using some labeled examples. In many applications such as fraud detection and intrusion detection, this issue becomes more important. Most existing techniques are unsupervised. On the other hand, semi-supervised approaches use both negative and positive instances to detect outliers. However, in many real world applications, very few positive labeled examples are available. This paper proposes an innovative approach to address this problem. The proposed method works as follows. First, some reliable negative instances are extracted by a kNN-based algorithm. Afterwards, fuzzy clustering using both negative and positive examples is utilized to detect outliers. Experimental results on real data sets demonstrate that the proposed approach outperforms the previous unsupervised state-of-the-art methods in detecting outliers.

Download Full-text

Review Summary Generation in Online Systems: Frameworks for Supervised and Unsupervised Scenarios

ACM Transactions on the Web ◽

10.1145/3448015 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-33

Author(s):

Wenjun Jiang ◽

Jing Chen ◽

Xiaofei Ding ◽

Jie Wu ◽

Jiawei He ◽

...

Keyword(s):

Decision Making ◽

Real World ◽

Text Summarization ◽

Experimental Results ◽

Product Review ◽

Comprehensive Review ◽

Online Systems ◽

Real World Datasets ◽

Different Characteristics

In online systems, including e-commerce platforms, many users resort to the reviews or comments generated by previous consumers for decision making, while their time is limited to deal with many reviews. Therefore, a review summary, which contains all important features in user-generated reviews, is expected. In this article, we study “how to generate a comprehensive review summary from a large number of user-generated reviews.” This can be implemented by text summarization, which mainly has two types of extractive and abstractive approaches. Both of these approaches can deal with both supervised and unsupervised scenarios, but the former may generate redundant and incoherent summaries, while the latter can avoid redundancy but usually can only deal with short sequences. Moreover, both approaches may neglect the sentiment information. To address the above issues, we propose comprehensive Review Summary Generation frameworks to deal with the supervised and unsupervised scenarios. We design two different preprocess models of re-ranking and selecting to identify the important sentences while keeping users’ sentiment in the original reviews. These sentences can be further used to generate review summaries with text summarization methods. Experimental results in seven real-world datasets (Idebate, Rotten Tomatoes Amazon, Yelp, and three unlabelled product review datasets in Amazon) demonstrate that our work performs well in review summary generation. Moreover, the re-ranking and selecting models show different characteristics.

Download Full-text

Dimension Reduction for Objects Composed of Vector Sets

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2017-0012 ◽

2017 ◽

Vol 27 (1) ◽

pp. 169-180 ◽

Cited By ~ 1

Author(s):

Marton Szemenyei ◽

Ferenc Vajda

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Discriminant Analysis ◽

Probability Distribution ◽

Dimension Reduction ◽

Pose Estimation ◽

Real World ◽

Single Object ◽

Real World Datasets

Abstract Dimension reduction and feature selection are fundamental tools for machine learning and data mining. Most existing methods, however, assume that objects are represented by a single vectorial descriptor. In reality, some description methods assign unordered sets or graphs of vectors to a single object, where each vector is assumed to have the same number of dimensions, but is drawn from a different probability distribution. Moreover, some applications (such as pose estimation) may require the recognition of individual vectors (nodes) of an object. In such cases it is essential that the nodes within a single object remain distinguishable after dimension reduction. In this paper we propose new discriminant analysis methods that are able to satisfy two criteria at the same time: separating between classes and between the nodes of an object instance. We analyze and evaluate our methods on several different synthetic and real-world datasets.

Download Full-text

Image Enhancement Techniques Using Particle Swarm Optimization Technique

Advances in Computational Intelligence and Robotics - Handbook of Research on Swarm Intelligence in Engineering ◽

10.4018/978-1-4666-8291-7.ch010 ◽

2015 ◽

pp. 327-347 ◽

Cited By ~ 2

Author(s):

V. Santhi ◽

B. K. Tripathy

Keyword(s):

Image Processing ◽

Pattern Recognition ◽

Particle Swarm Optimization ◽

Real Time ◽

Quality Enhancement ◽

Swarm Optimization ◽

Image Processing Techniques ◽

Real Time Applications ◽

High Level ◽

Processing Techniques

The image quality enhancement process is considered as one of the basic requirement for high-level image processing techniques that demand good quality in images. High-level image processing techniques include feature extraction, morphological processing, pattern recognition, automation engineering, and many more. Many classical enhancement methods are available for enhancing the quality of images and they can be carried out either in spatial domain or in frequency domain. But in real time applications, the quality enhancement process carried out by classical approaches may not serve the purpose. It is required to combine the concept of computational intelligence with the classical approaches to meet the requirements of real-time applications. In recent days, Particle Swarm Optimization (PSO) technique is considered one of the new approaches in optimization techniques and it is used extensively in image processing and pattern recognition applications. In this chapter, image enhancement is considered an optimization problem, and different methods to solve it through PSO are discussed in detail.

Download Full-text

Real-Time Vehicle Make and Model Recognition with the Residual SqueezeNet Architecture

Sensors ◽

10.3390/s19050982 ◽

2019 ◽

Vol 19 (5) ◽

pp. 982 ◽

Cited By ~ 9

Author(s):

Hyo Lee ◽

Ihsan Ullah ◽

Weiguo Wan ◽

Yongbin Gao ◽

Zhijun Fang

Keyword(s):

Deep Learning ◽

Real Time ◽

Large Scale ◽

Recognition Rate ◽

Experimental Results ◽

Learning Approach ◽

Deep Model ◽

Proposed Model ◽

Real Time Applications ◽

Model Recognition

Make and model recognition (MMR) of vehicles plays an important role in automatic vision-based systems. This paper proposes a novel deep learning approach for MMR using the SqueezeNet architecture. The frontal views of vehicle images are first extracted and fed into a deep network for training and testing. The SqueezeNet architecture with bypass connections between the Fire modules, a variant of the vanilla SqueezeNet, is employed for this study, which makes our MMR system more efficient. The experimental results on our collected large-scale vehicle datasets indicate that the proposed model achieves 96.3% recognition rate at the rank-1 level with an economical time slice of 108.8 ms. For inference tasks, the deployed deep model requires less than 5 MB of space and thus has a great viability in real-time applications.

Download Full-text

An Efficient Distance and Density Based Outlier Detection Approach

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.155-156.342 ◽

2012 ◽

Vol 155-156 ◽

pp. 342-347 ◽

Cited By ~ 1

Author(s):

Xun Biao Zhong ◽

Xiao Xia Huang

Keyword(s):

Outlier Detection ◽

Real World ◽

High Performance ◽

Detection Problem ◽

Empirical Results ◽

Detection Approach ◽

Real World Datasets ◽

Good Detection

In order to solve the density based outlier detection problem with low accuracy and high computation, a variance of distance and density (VDD) measure is proposed in this paper. And the k-means clustering and score based VDD (KSVDD) approach proposed can efficiently detect outliers with high performance. For illustration, two real-world datasets are utilized to show the feasibility of the approach. Empirical results show that KSVDD has a good detection precision.

Download Full-text

An Outlier Detection Method Based on Fuzzy C-Means Clustering

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.419-420.165 ◽

2009 ◽

Vol 419-420 ◽

pp. 165-168

Author(s):

Qiang Li ◽

Jian Pei Zhang ◽

Guang Sheng Feng

Keyword(s):

Data Mining ◽

Outlier Detection ◽

Detection Method ◽

Experimental Results ◽

Data Mining Techniques ◽

Fuzzy C Means ◽

Second Stage ◽

Fuzzy C Means Clustering ◽

Fcm Clustering ◽

Two Stages

Both fuzzy c-means (FCM) clustering and outlier detection are useful data mining techniques in real applications. In this paper, we show that the task of outlier detection could be achieved as by-product of fuzzy c-means clustering. The proposed strategy consists of two stages. The first stage consists of purely fuzzy c-means process, while the second stage identifies exceptional objects according to a novel metric based on the entropy of membership values. We provide experimental results to demonstrate the effectiveness of our technique.

Download Full-text

Intrusion Detection Based on Self-Organizing Map and Artificial Immunisation Algorithm

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.439-440.29 ◽

2010 ◽

Vol 439-440 ◽

pp. 29-34 ◽

Cited By ~ 1

Author(s):

Zhen Guo Chen ◽

Guang Hua Zhang ◽

Li Qin Tian ◽

Zi Lin Geng

Keyword(s):

Intrusion Detection ◽

Computer Security ◽

Real World ◽

User Behavior ◽

Experimental Result ◽

System Call ◽

Self Organizing Map ◽

System Calls ◽

Real World Datasets ◽

Self Organizing

The rate of false positives which caused by the variability of environment and user behavior limits the applications of intrusion detecting system in real world. Intrusion detection is an important technique in the defense-in-depth network security framework and a hot topic in computer security in recent years. To solve the intrusion detection question, we introduce the self-organizing map and artificial immunisation algorithm into intrusion detection. In this paper, we give an method of rule extraction based on self-organizing map and artificial immunisation algorithm and used in intrusion detection. After illustrating our model with a representative dataset and applying it to the real-world datasets MIT lpr system calls. The experimental result shown that We propose an idea of learning different representations for system call arguments. Results indicate that this information can be effectively used for detecting more attacks with reasonable space and time overhead. So our experiment is feasible and effective that using in intrusion detection.

Download Full-text

Enhancement of classification accuracy of our Adaptive Classifier using image processing techniques in the field of Medical Data Mining

2015 International Conference on Green Computing and Internet of Things (ICGCIoT) ◽

10.1109/icgciot.2015.7380599 ◽

2015 ◽

Cited By ~ 1

Author(s):

Sneha Chandra ◽

Maneet Kaur

Keyword(s):

Data Mining ◽

Image Processing ◽

Classification Accuracy ◽

Medical Data ◽

Medical Data Mining ◽

Image Processing Techniques ◽

Processing Techniques

Download Full-text