scholarly journals FP-outlier: Frequent pattern based outlier detection

2005 ◽  
Vol 2 (1) ◽  
pp. 103-118 ◽  
Author(s):  
Zengyou He ◽  
Xiaofei Xu ◽  
Zhexue Huang ◽  
Shengchun Deng

An outlier in a dataset is an observation or a point that is considerably dissimilar to or inconsistent with the remainder of the data. Detection of such outliers is important for many applications and has recently attracted much attention in the data mining research community. In this paper, we present a new method to detect outliers by discovering frequent patterns (or frequent itemsets) from the data set. The outliers are defined as the data transactions that contain less frequent patterns in their itemsets. We define a measure called FPOF (Frequent Pattern Outlier Factor) to detect the outlier transactions and propose the FindFPOF algorithm to discover outliers. The experimental results have shown that our approach outperformed the existing methods on identifying interesting outliers.

2014 ◽  
Vol 635-637 ◽  
pp. 1723-1728
Author(s):  
Shi Bo Zhou ◽  
Wei Xiang Xu

Local outliers detection is an important issue in data mining. By analyzing the limitations of the existing outlier detection algorthms, a local outlier detection algorthm based on coefficient of variation is introduced. This algorthms applies K-means which is strong in outliers searching, divides data set into sections, puts outliers and their nearing clusters into a local neighbourhood, then figures out the local deviation factor of each local neighbourhood by coefficient of variation, as a result, local outliers can more likely be found.The heoretic analysis and experimental results indicate that the method is ef fective and efficient.


Data ◽  
2020 ◽  
Vol 6 (1) ◽  
pp. 1
Author(s):  
Ahmed Elmogy ◽  
Hamada Rizk ◽  
Amany M. Sarhan

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.


2021 ◽  
Vol 50 (1) ◽  
pp. 138-152
Author(s):  
Mujeeb Ur Rehman ◽  
Dost Muhammad Khan

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.


Data Mining ◽  
2013 ◽  
pp. 142-158
Author(s):  
Baoying Wang ◽  
Aijuan Dong

Clustering and outlier detection are important data mining areas. Online clustering and outlier detection generally work with continuous data streams generated at a rapid rate and have many practical applications, such as network instruction detection and online fraud detection. This chapter first reviews related background of online clustering and outlier detection. Then, an incremental clustering and outlier detection method for market-basket data is proposed and presented in details. This proposed method consists of two phases: weighted affinity measure clustering (WC clustering) and outlier detection. Specifically, given a data set, the WC clustering phase analyzes the data set and groups data items into clusters. Then, outlier detection phase examines each newly arrived transaction against the item clusters formed in WC clustering phase, and determines whether the new transaction is an outlier. Periodically, the newly collected transactions are analyzed using WC clustering to produce an updated set of clusters, against which transactions arrived afterwards are examined. The process is carried out continuously and incrementally. Finally, the future research trends on online data mining are explored at the end of the chapter.


Author(s):  
Fabrizio Angiulli

Data mining techniques can be grouped in four main categories: clustering, classification, dependency detection, and outlier detection. Clustering is the process of partitioning a set of objects into homogeneous groups, or clusters. Classification is the task of assigning objects to one of several predefined categories. Dependency detection searches for pairs of attribute sets which exhibit some degree of correlation in the data set at hand. The outlier detection task can be defined as follows: “Given a set of data points or objects, find the objects that are considerably dissimilar, exceptional or inconsistent with respect to the remaining data”. These exceptional objects as also referred to as outliers. Most of the early methods for outlier identification have been developed in the field of statistics (Hawkins, 1980; Barnett & Lewis, 1994). Hawkins’ definition of outlier clarifies the approach: “An outlier is an observation that deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism”. Indeed, statistical techniques assume that the given data set has a distribution model. Outliers are those points that satisfy a discordancy test, that is, that are significantly far from what would be their expected position given the hypothesized distribution. Many clustering, classification and dependency detection methods produce outliers as a by-product of their main task. For example, in classification, mislabeled objects are considered outliers and thus they are removed from the training set to improve the accuracy of the resulting classifier, while in clustering, objects that do not strongly belong to any cluster are considered outliers. Nevertheless, it must be said that searching for outliers through techniques specifically designed for tasks different from outlier detection could not be advantageous. As an example, clusters can be distorted by outliers and, thus, the quality of the outliers returned is affected by their presence. Moreover, other than returning a solution of higher quality, outlier detection algorithms can be vastly more efficient than non ad-hoc algorithms. While in many contexts outliers are considered as noise that must be eliminated, as pointed out elsewhere, “one person’s noise could be another person’s signal”, and thus outliers themselves can be of great interest. Outlier mining is used in telecom or credit card frauds to detect the atypical usage of telecom services or credit cards, in intrusion detection for detecting unauthorized accesses, in medical analysis to test abnormal reactions to new medical therapies, in marketing and customer segmentations to identify customers spending much more or much less than average customer, in surveillance systems, in data cleaning, and in many other fields.


2017 ◽  
Vol 10 (13) ◽  
pp. 191
Author(s):  
Nikhil Jamdar ◽  
A Vijayalakshmi

There are many algorithms available in data mining to search interesting patterns from transactional databases of precise data. Frequent pattern mining is a technique to find the frequently occurred items in data mining. Most of the techniques used to find all the interesting patterns from a collection of precise data, where items occurred in each transaction are certainly known to the system. As well as in many real-time applications, users are interested in a tiny portion of large frequent patterns. So the proposed user constrained mining approach, will help to find frequent patterns in which user is interested. This approach will efficiently find user interested frequent patterns by applying user constraints on the collections of uncertain data. The user can specify their own interest in the form of constraints and uses the Map Reduce model to find uncertain frequent pattern that satisfy the user-specified constraints 


2009 ◽  
Vol 419-420 ◽  
pp. 165-168
Author(s):  
Qiang Li ◽  
Jian Pei Zhang ◽  
Guang Sheng Feng

Both fuzzy c-means (FCM) clustering and outlier detection are useful data mining techniques in real applications. In this paper, we show that the task of outlier detection could be achieved as by-product of fuzzy c-means clustering. The proposed strategy consists of two stages. The first stage consists of purely fuzzy c-means process, while the second stage identifies exceptional objects according to a novel metric based on the entropy of membership values. We provide experimental results to demonstrate the effectiveness of our technique.


2013 ◽  
Vol 411-414 ◽  
pp. 386-389 ◽  
Author(s):  
Tian Tian Xu ◽  
Xiang Jun Dong

Negative frequent itemsets (NFIS) like (a1a2¬a3a4) have played important roles in real applications because we can mine valued negative association rules from them. In one of our previous work, we proposed a method, namede-NFISto mine NFIS from positive frequent itemsets (PFIS). However,e-NFISonly uses single minimum support, which implicitly assumes that all items in the database are of the same nature or of similar frequencies in the database. This is often not the case in real-life applications. So a lot of methods to mine frequent itemsets with multiple minimum supports have been proposed. These methods allow users to assign different minimum supports to different items. But these methods only mine PFIS, doesn’t consider negative ones. So in this paper, we propose a new method, namede-msNFIS, to mine NFIS from PFIS based on multiple minimum supports. E-msNFIScontains three steps: 1) using existing methods to mine PFIS with multiple minimum supports; 2) using the same method ine-NFISto generate NCIS from PFIS got in step 1; 3) calculating the support of these NCIS only using the support of PFIS and then gettingNFIS. Experimental results show that thee-msNFISis efficient.


2005 ◽  
Vol 1 (3) ◽  
pp. 129-135
Author(s):  
Jun Luo ◽  
Sanguthevar Rajasekaran

Association rules mining is an important data mining problem that has been studied extensively. In this paper, a simple but Fast algorithm for Intersecting attributes lists using hash Tables (FIT) is presented. FIT is designed for efficiently computing all the frequent itemsets in large databases. It deploys an idea similar to Eclat but has a much better computational performance than Eclat due to two reasons: 1) FIT makes fewer total number of comparisons for each intersection operation between two attributes lists, and 2) FIT significantly reduces the total number of intersection operations. Our experimental results demonstrate that the performance of FIT is much better than that of Eclat and Apriori algorithms.


2012 ◽  
Vol 195-196 ◽  
pp. 984-986
Author(s):  
Ming Ru Zhao ◽  
Yuan Sun ◽  
Jian Guo ◽  
Ping Ping Dong

Frequent itemsets mining is an important data mining task and a focused theme in data mining research. Apriori algorithm is one of the most important algorithm of mining frequent itemsets. However, the Apriori algorithm scans the database too many times, so its efficiency is relatively low. The paper has therefore conducted a research on the mining frequent itemsets algorithm based on a across linker. Through comparing with the classical algorithm, the improved algorithm has obvious advantages.


Sign in / Sign up

Export Citation Format

Share Document