An Outlier Detection Method Based on Fuzzy C-Means Clustering

2009 ◽  
Vol 419-420 ◽  
pp. 165-168
Author(s):  
Qiang Li ◽  
Jian Pei Zhang ◽  
Guang Sheng Feng

Both fuzzy c-means (FCM) clustering and outlier detection are useful data mining techniques in real applications. In this paper, we show that the task of outlier detection could be achieved as by-product of fuzzy c-means clustering. The proposed strategy consists of two stages. The first stage consists of purely fuzzy c-means process, while the second stage identifies exceptional objects according to a novel metric based on the entropy of membership values. We provide experimental results to demonstrate the effectiveness of our technique.

Data ◽  
2020 ◽  
Vol 6 (1) ◽  
pp. 1
Author(s):  
Ahmed Elmogy ◽  
Hamada Rizk ◽  
Amany M. Sarhan

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.


2013 ◽  
Vol 765-767 ◽  
pp. 670-673
Author(s):  
Li Bo Hou

Fuzzy C-means (FCM) clustering algorithm is one of the widely applied algorithms in non-supervision of pattern recognition. However, FCM algorithm in the iterative process requires a lot of calculations, especially when feature vectors has high-dimensional, Use clustering algorithm to sub-heap, not only inefficient, but also may lead to "the curse of dimensionality." For the problem, This paper analyzes the fuzzy C-means clustering algorithm in high dimensional feature of the process, the problem of cluster center is an np-hard problem, In order to improve the effectiveness and Real-time of fuzzy C-means clustering algorithm in high dimensional feature analysis, Combination of landmark isometric (L-ISOMAP) algorithm, Proposed improved algorithm FCM-LI. Preliminary analysis of the samples, Use clustering results and the correlation of sample data, using landmark isometric (L-ISOMAP) algorithm to reduce the dimension, further analysis on the basis, obtained the final results. Finally, experimental results show that the effectiveness and Real-time of FCM-LI algorithm in high dimensional feature analysis.


Author(s):  
Chunhua Ren ◽  
Linfu Sun

AbstractThe classic Fuzzy C-means (FCM) algorithm has limited clustering performance and is prone to misclassification of border points. This study offers a bi-directional FCM clustering ensemble approach that takes local information into account (LI_BIFCM) to overcome these challenges and increase clustering quality. First, various membership matrices are created after running FCM multiple times, based on the randomization of the initial cluster centers, and a vertical ensemble is performed using the maximum membership principle. Second, after each execution of FCM, multiple local membership matrices of the sample points are created using multiple K-nearest neighbors, and a horizontal ensemble is performed. Multiple horizontal ensembles can be created using multiple FCM clustering. Finally, the final clustering results are obtained by combining the vertical and horizontal clustering ensembles. Twelve data sets were chosen for testing from both synthetic and real data sources. The LI_BIFCM clustering performance outperformed four traditional clustering algorithms and three clustering ensemble algorithms in the experiments. Furthermore, the final clustering results has a weak correlation with the bi-directional cluster ensemble parameters, indicating that the suggested technique is robust.


Author(s):  
Frank Rehm ◽  
Roland Winkler ◽  
Rudolf Kruse

A well known issue with prototype-based clustering is the user’s obligation to know the right number of clusters in a dataset in advance or to determine it as a part of the data analysis process. There are different approaches to cope with this non-trivial problem. This chapter follows the approach to address this problem as an integrated part of the clustering process. An extension to repulsive fuzzy c-means clustering is proposed equipping non-Euclidean prototypes with repulsive properties. Experimental results are presented that demonstrate the feasibility of the authors’ technique.


2017 ◽  
Vol 24 (5) ◽  
pp. 1253-1268
Author(s):  
Thamaraiselvan Natarajan ◽  
Sridevi Periaiya ◽  
Senthil Arasu Balasubramaniam ◽  
Thushara Srinivasan

Purpose The purpose of this paper is to identify and analyse the typology of employee branding in an airline company using fuzzy c-means (FCM) clustering to improve the quality of employee brand (EB). Design/methodology/approach Data were collected from employees of Air India, Chennai division, using a questionnaire and analysed using FCM to find the optimum cluster number. The nature of each cluster was analysed to know its type. Findings The results prove the presence of four types of EB, namely, all-stars, injured reserves, rookies and strike-out kings in the aviation company. It is proven that employees in all-star have high level of knowledge of the desired brand (KDB) and psychological contract (PC), those in injured reserves have high KDB and low PC, rookies have low KDB and high PC and strike-out kings have low KDB and PC. Research limitations/implications The results of this study are limited to the Air India employees. This study contributes to employee branding by empirically substantiating the proposed typology using FCM. It proposes the need to analyse organisations individually before comparisons. Practical implications The management must focus on the quality of training and development programmes to enhance the position of rookies and strike-out kings. It must also receive regular feedback from injured reserves and strike-out kings to evaluate their perception of PC. Originality/value This is the first paper to empirically prove the typology of employee branding and to implement FCM in clustering employees for enhancing the EB’s quality.


2014 ◽  
Vol 635-637 ◽  
pp. 1723-1728
Author(s):  
Shi Bo Zhou ◽  
Wei Xiang Xu

Local outliers detection is an important issue in data mining. By analyzing the limitations of the existing outlier detection algorthms, a local outlier detection algorthm based on coefficient of variation is introduced. This algorthms applies K-means which is strong in outliers searching, divides data set into sections, puts outliers and their nearing clusters into a local neighbourhood, then figures out the local deviation factor of each local neighbourhood by coefficient of variation, as a result, local outliers can more likely be found.The heoretic analysis and experimental results indicate that the method is ef fective and efficient.


2017 ◽  
Vol 63 (No. 8) ◽  
pp. 370-380 ◽  
Author(s):  
Jafarzadeh Ali Akbar ◽  
Mahdavi Ali ◽  
Jafarzadeh Heydar

In this study we evaluated forest fire risk in the west of Iran using the Apriori algorithm and fuzzy c-means (FCM) clustering. We used twelve different input parameters to model fire risk in Ilam Province. Our results with minimum support and minimum confidence show strong relationships between wildfire occurrence and eight variables (distance from settlement, population density, distance from road, slope, standing dead oak trees, temperature, land cover and distance from farm land). In this study, we defined three clusters for each variable: low, middle and high. The data regarding the factors affecting forest fire risk were distributed in these three clusters with different degrees of membership and the final map of all factors was classified by FCM clustering. Each layer was then created in a geographic information system. Finally, wildfire risks in the area obtained from overlaying these layers were classified into five categories, from very low to very high according to the degree of danger.


Sign in / Sign up

Export Citation Format

Share Document