scholarly journals OFCOD: On the Fly Clustering Based Outlier Detection Framework

Data ◽  
2020 ◽  
Vol 6 (1) ◽  
pp. 1
Author(s):  
Ahmed Elmogy ◽  
Hamada Rizk ◽  
Amany M. Sarhan

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

2020 ◽  
Vol 34 (04) ◽  
pp. 6837-6844
Author(s):  
Xiaojin Zhang ◽  
Honglei Zhuang ◽  
Shengyu Zhang ◽  
Yuan Zhou

We study a variant of the thresholding bandit problem (TBP) in the context of outlier detection, where the objective is to identify the outliers whose rewards are above a threshold. Distinct from the traditional TBP, the threshold is defined as a function of the rewards of all the arms, which is motivated by the criterion for identifying outliers. The learner needs to explore the rewards of the arms as well as the threshold. We refer to this problem as "double exploration for outlier detection". We construct an adaptively updated confidence interval for the threshold, based on the estimated value of the threshold in the previous rounds. Furthermore, by automatically trading off exploring the individual arms and exploring the outlier threshold, we provide an efficient algorithm in terms of the sample complexity. Experimental results on both synthetic datasets and real-world datasets demonstrate the efficiency of our algorithm.


2015 ◽  
Vol 24 (03) ◽  
pp. 1550003 ◽  
Author(s):  
Armin Daneshpazhouh ◽  
Ashkan Sami

The task of semi-supervised outlier detection is to find the instances that are exceptional from other data, using some labeled examples. In many applications such as fraud detection and intrusion detection, this issue becomes more important. Most existing techniques are unsupervised. On the other hand, semi-supervised approaches use both negative and positive instances to detect outliers. However, in many real world applications, very few positive labeled examples are available. This paper proposes an innovative approach to address this problem. The proposed method works as follows. First, some reliable negative instances are extracted by a kNN-based algorithm. Afterwards, fuzzy clustering using both negative and positive examples is utilized to detect outliers. Experimental results on real data sets demonstrate that the proposed approach outperforms the previous unsupervised state-of-the-art methods in detecting outliers.


2021 ◽  
Vol 15 (3) ◽  
pp. 1-33
Author(s):  
Wenjun Jiang ◽  
Jing Chen ◽  
Xiaofei Ding ◽  
Jie Wu ◽  
Jiawei He ◽  
...  

In online systems, including e-commerce platforms, many users resort to the reviews or comments generated by previous consumers for decision making, while their time is limited to deal with many reviews. Therefore, a review summary, which contains all important features in user-generated reviews, is expected. In this article, we study “how to generate a comprehensive review summary from a large number of user-generated reviews.” This can be implemented by text summarization, which mainly has two types of extractive and abstractive approaches. Both of these approaches can deal with both supervised and unsupervised scenarios, but the former may generate redundant and incoherent summaries, while the latter can avoid redundancy but usually can only deal with short sequences. Moreover, both approaches may neglect the sentiment information. To address the above issues, we propose comprehensive Review Summary Generation frameworks to deal with the supervised and unsupervised scenarios. We design two different preprocess models of re-ranking and selecting to identify the important sentences while keeping users’ sentiment in the original reviews. These sentences can be further used to generate review summaries with text summarization methods. Experimental results in seven real-world datasets (Idebate, Rotten Tomatoes Amazon, Yelp, and three unlabelled product review datasets in Amazon) demonstrate that our work performs well in review summary generation. Moreover, the re-ranking and selecting models show different characteristics.


2017 ◽  
Vol 27 (1) ◽  
pp. 169-180 ◽  
Author(s):  
Marton Szemenyei ◽  
Ferenc Vajda

Abstract Dimension reduction and feature selection are fundamental tools for machine learning and data mining. Most existing methods, however, assume that objects are represented by a single vectorial descriptor. In reality, some description methods assign unordered sets or graphs of vectors to a single object, where each vector is assumed to have the same number of dimensions, but is drawn from a different probability distribution. Moreover, some applications (such as pose estimation) may require the recognition of individual vectors (nodes) of an object. In such cases it is essential that the nodes within a single object remain distinguishable after dimension reduction. In this paper we propose new discriminant analysis methods that are able to satisfy two criteria at the same time: separating between classes and between the nodes of an object instance. We analyze and evaluate our methods on several different synthetic and real-world datasets.


Author(s):  
V. Santhi ◽  
B. K. Tripathy

The image quality enhancement process is considered as one of the basic requirement for high-level image processing techniques that demand good quality in images. High-level image processing techniques include feature extraction, morphological processing, pattern recognition, automation engineering, and many more. Many classical enhancement methods are available for enhancing the quality of images and they can be carried out either in spatial domain or in frequency domain. But in real time applications, the quality enhancement process carried out by classical approaches may not serve the purpose. It is required to combine the concept of computational intelligence with the classical approaches to meet the requirements of real-time applications. In recent days, Particle Swarm Optimization (PSO) technique is considered one of the new approaches in optimization techniques and it is used extensively in image processing and pattern recognition applications. In this chapter, image enhancement is considered an optimization problem, and different methods to solve it through PSO are discussed in detail.


Sensors ◽  
2019 ◽  
Vol 19 (5) ◽  
pp. 982 ◽  
Author(s):  
Hyo Lee ◽  
Ihsan Ullah ◽  
Weiguo Wan ◽  
Yongbin Gao ◽  
Zhijun Fang

Make and model recognition (MMR) of vehicles plays an important role in automatic vision-based systems. This paper proposes a novel deep learning approach for MMR using the SqueezeNet architecture. The frontal views of vehicle images are first extracted and fed into a deep network for training and testing. The SqueezeNet architecture with bypass connections between the Fire modules, a variant of the vanilla SqueezeNet, is employed for this study, which makes our MMR system more efficient. The experimental results on our collected large-scale vehicle datasets indicate that the proposed model achieves 96.3% recognition rate at the rank-1 level with an economical time slice of 108.8 ms. For inference tasks, the deployed deep model requires less than 5 MB of space and thus has a great viability in real-time applications.


2012 ◽  
Vol 155-156 ◽  
pp. 342-347 ◽  
Author(s):  
Xun Biao Zhong ◽  
Xiao Xia Huang

In order to solve the density based outlier detection problem with low accuracy and high computation, a variance of distance and density (VDD) measure is proposed in this paper. And the k-means clustering and score based VDD (KSVDD) approach proposed can efficiently detect outliers with high performance. For illustration, two real-world datasets are utilized to show the feasibility of the approach. Empirical results show that KSVDD has a good detection precision.


2009 ◽  
Vol 419-420 ◽  
pp. 165-168
Author(s):  
Qiang Li ◽  
Jian Pei Zhang ◽  
Guang Sheng Feng

Both fuzzy c-means (FCM) clustering and outlier detection are useful data mining techniques in real applications. In this paper, we show that the task of outlier detection could be achieved as by-product of fuzzy c-means clustering. The proposed strategy consists of two stages. The first stage consists of purely fuzzy c-means process, while the second stage identifies exceptional objects according to a novel metric based on the entropy of membership values. We provide experimental results to demonstrate the effectiveness of our technique.


2010 ◽  
Vol 439-440 ◽  
pp. 29-34 ◽  
Author(s):  
Zhen Guo Chen ◽  
Guang Hua Zhang ◽  
Li Qin Tian ◽  
Zi Lin Geng

The rate of false positives which caused by the variability of environment and user behavior limits the applications of intrusion detecting system in real world. Intrusion detection is an important technique in the defense-in-depth network security framework and a hot topic in computer security in recent years. To solve the intrusion detection question, we introduce the self-organizing map and artificial immunisation algorithm into intrusion detection. In this paper, we give an method of rule extraction based on self-organizing map and artificial immunisation algorithm and used in intrusion detection. After illustrating our model with a representative dataset and applying it to the real-world datasets MIT lpr system calls. The experimental result shown that We propose an idea of learning different representations for system call arguments. Results indicate that this information can be effectively used for detecting more attacks with reasonable space and time overhead. So our experiment is feasible and effective that using in intrusion detection.


Sign in / Sign up

Export Citation Format

Share Document