An Outlier Detection Method Based on Fuzzy C-Means Clustering

Both fuzzy c-means (FCM) clustering and outlier detection are useful data mining techniques in real applications. In this paper, we show that the task of outlier detection could be achieved as by-product of fuzzy c-means clustering. The proposed strategy consists of two stages. The first stage consists of purely fuzzy c-means process, while the second stage identifies exceptional objects according to a novel metric based on the entropy of membership values. We provide experimental results to demonstrate the effectiveness of our technique.

Download Full-text

OFCOD: On the Fly Clustering Based Outlier Detection Framework

Data ◽

10.3390/data6010001 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Ahmed Elmogy ◽

Hamada Rizk ◽

Amany M. Sarhan

Keyword(s):

Data Mining ◽

Image Processing ◽

Intrusion Detection ◽

Real Time ◽

Outlier Detection ◽

Real World ◽

Medical Data ◽

Experimental Results ◽

Real Time Applications ◽

Real World Datasets

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

Download Full-text

Educational data mining for students' performance based on fuzzy C-means clustering

The Journal of Engineering ◽

10.1049/joe.2019.0938 ◽

2019 ◽

Vol 2019 (11) ◽

pp. 8245-8250

Author(s):

Yu Li ◽

Jin Gou ◽

Zongwen Fan

Keyword(s):

Data Mining ◽

Educational Data Mining ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering

Download Full-text

Improved Fuzzy FCM-LI Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.765-767.670 ◽

2013 ◽

Vol 765-767 ◽

pp. 670-673

Author(s):

Li Bo Hou

Keyword(s):

Real Time ◽

Clustering Algorithm ◽

Feature Analysis ◽

Cluster Center ◽

High Dimensional ◽

Fuzzy C Means ◽

Sample Data ◽

Fuzzy C Means Clustering ◽

Fcm Clustering ◽

Np Hard Problem

Fuzzy C-means (FCM) clustering algorithm is one of the widely applied algorithms in non-supervision of pattern recognition. However, FCM algorithm in the iterative process requires a lot of calculations, especially when feature vectors has high-dimensional, Use clustering algorithm to sub-heap, not only inefficient, but also may lead to "the curse of dimensionality." For the problem, This paper analyzes the fuzzy C-means clustering algorithm in high dimensional feature of the process, the problem of cluster center is an np-hard problem, In order to improve the effectiveness and Real-time of fuzzy C-means clustering algorithm in high dimensional feature analysis, Combination of landmark isometric (L-ISOMAP) algorithm, Proposed improved algorithm FCM-LI. Preliminary analysis of the samples, Use clustering results and the correlation of sample data, using landmark isometric (L-ISOMAP) algorithm to reduce the dimension, further analysis on the basis, obtained the final results. Finally, experimental results show that the effectiveness and Real-time of FCM-LI algorithm in high dimensional feature analysis.

Download Full-text

A Bi-directional Fuzzy C-Means Clustering Ensemble Algorithm Considering Local Information

International Journal of Computational Intelligence Systems ◽

10.1007/s44196-021-00014-z ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Chunhua Ren ◽

Linfu Sun

Keyword(s):

Clustering Algorithms ◽

Real Data ◽

Local Information ◽

Data Sets ◽

Clustering Ensemble ◽

K Nearest Neighbors ◽

Fuzzy C Means ◽

Clustering Quality ◽

Fuzzy C Means Clustering ◽

Fcm Clustering

AbstractThe classic Fuzzy C-means (FCM) algorithm has limited clustering performance and is prone to misclassification of border points. This study offers a bi-directional FCM clustering ensemble approach that takes local information into account (LI_BIFCM) to overcome these challenges and increase clustering quality. First, various membership matrices are created after running FCM multiple times, based on the randomization of the initial cluster centers, and a vertical ensemble is performed using the maximum membership principle. Second, after each execution of FCM, multiple local membership matrices of the sample points are created using multiple K-nearest neighbors, and a horizontal ensemble is performed. Multiple horizontal ensembles can be created using multiple FCM clustering. Finally, the final clustering results are obtained by combining the vertical and horizontal clustering ensembles. Twelve data sets were chosen for testing from both synthetic and real data sources. The LI_BIFCM clustering performance outperformed four traditional clustering algorithms and three clustering ensemble algorithms in the experiments. Furthermore, the final clustering results has a weak correlation with the bi-directional cluster ensemble parameters, indicating that the suggested technique is robust.

Download Full-text

Improved Thyroid Disease Prediction Model Using Data Mining Techniques with Outlier Detection

Intelligent Systems Reference Library - Advanced Machine Learning Approaches in Cancer Prognosis ◽

10.1007/978-3-030-71975-3_5 ◽

2021 ◽

pp. 129-161

Author(s):

Yasir Iqbal Mir

Keyword(s):

Data Mining ◽

Prediction Model ◽

Outlier Detection ◽

Thyroid Disease ◽

Disease Prediction ◽

Data Mining Techniques ◽

Using Data

Download Full-text

Fuzzy Clustering with Repulsive Prototypes

Scalable Fuzzy Algorithms for Data Management and Analysis ◽

10.4018/978-1-60566-858-1.ch013 ◽

2010 ◽

pp. 332-346

Author(s):

Frank Rehm ◽

Roland Winkler ◽

Rudolf Kruse

Keyword(s):

Data Analysis ◽

Fuzzy Clustering ◽

Experimental Results ◽

Number Of Clusters ◽

Fuzzy C Means ◽

Analysis Process ◽

Fuzzy C Means Clustering ◽

The Right

A well known issue with prototype-based clustering is the user’s obligation to know the right number of clusters in a dataset in advance or to determine it as a part of the data analysis process. There are different approaches to cope with this non-trivial problem. This chapter follows the approach to address this problem as an integrated part of the clustering process. An extension to repulsive fuzzy c-means clustering is proposed equipping non-Euclidean prototypes with repulsive properties. Experimental results are presented that demonstrate the feasibility of the authors’ technique.

Download Full-text

Prognosis of Diabetes Using Data mining Approach-Fuzzy C Means Clustering and Support Vector Machine

International Journal of Computer Trends and Technology ◽

10.14445/22312803/ijctt-v11p120 ◽

2014 ◽

Vol 11 (2) ◽

pp. 94-98 ◽

Cited By ~ 14

Author(s):

Ravi Sanakal ◽

◽

Smt. T Jayakumari

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Support Vector ◽

Fuzzy C Means ◽

Data Mining Approach ◽

Fuzzy C Means Clustering ◽

Using Data

Download Full-text

Identification and analysis of employee branding typology using fuzzy c-means clustering

Benchmarking An International Journal ◽

10.1108/bij-01-2016-0010 ◽

2017 ◽

Vol 24 (5) ◽

pp. 1253-1268

Author(s):

Thamaraiselvan Natarajan ◽

Sridevi Periaiya ◽

Senthil Arasu Balasubramaniam ◽

Thushara Srinivasan

Keyword(s):

Psychological Contract ◽

Training And Development ◽

Cluster Number ◽

Content Type ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering ◽

Fcm Clustering ◽

Level Of Knowledge ◽

High Level

Purpose The purpose of this paper is to identify and analyse the typology of employee branding in an airline company using fuzzy c-means (FCM) clustering to improve the quality of employee brand (EB). Design/methodology/approach Data were collected from employees of Air India, Chennai division, using a questionnaire and analysed using FCM to find the optimum cluster number. The nature of each cluster was analysed to know its type. Findings The results prove the presence of four types of EB, namely, all-stars, injured reserves, rookies and strike-out kings in the aviation company. It is proven that employees in all-star have high level of knowledge of the desired brand (KDB) and psychological contract (PC), those in injured reserves have high KDB and low PC, rookies have low KDB and high PC and strike-out kings have low KDB and PC. Research limitations/implications The results of this study are limited to the Air India employees. This study contributes to employee branding by empirically substantiating the proposed typology using FCM. It proposes the need to analyse organisations individually before comparisons. Practical implications The management must focus on the quality of training and development programmes to enhance the position of rookies and strike-out kings. It must also receive regular feedback from injured reserves and strike-out kings to evaluate their perception of PC. Originality/value This is the first paper to empirically prove the typology of employee branding and to implement FCM in clustering employees for enhancing the EB’s quality.

Download Full-text

Local Outlier Detection Algorithm Based on Coefficient of Variation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.635-637.1723 ◽

2014 ◽

Vol 635-637 ◽

pp. 1723-1728

Author(s):

Shi Bo Zhou ◽

Wei Xiang Xu

Keyword(s):

Data Mining ◽

Outlier Detection ◽

Coefficient Of Variation ◽

Detection Algorithm ◽

Experimental Results ◽

Data Set ◽

Outliers Detection ◽

Deviation Factor ◽

Local Deviation ◽

Local Outlier

Local outliers detection is an important issue in data mining. By analyzing the limitations of the existing outlier detection algorthms, a local outlier detection algorthm based on coefficient of variation is introduced. This algorthms applies K-means which is strong in outliers searching, divides data set into sections, puts outliers and their nearing clusters into a local neighbourhood, then figures out the local deviation factor of each local neighbourhood by coefficient of variation, as a result, local outliers can more likely be found.The heoretic analysis and experimental results indicate that the method is ef fective and efficient.

Download Full-text

Evaluation of forest fire risk using the Apriori algorithm and fuzzy c-means clustering

Journal of Forest Science ◽

10.17221/7/2017-jfs ◽

2017 ◽

Vol 63 (No. 8) ◽

pp. 370-380 ◽

Cited By ~ 9

Author(s):

Jafarzadeh Ali Akbar ◽

Mahdavi Ali ◽

Jafarzadeh Heydar

Keyword(s):

Forest Fire ◽

Fire Risk ◽

Apriori Algorithm ◽

Factors Affecting ◽

Wildfire Occurrence ◽

Fuzzy C Means ◽

Standing Dead ◽

Fuzzy C Means Clustering ◽

Fcm Clustering ◽

Forest Fire Risk

In this study we evaluated forest fire risk in the west of Iran using the Apriori algorithm and fuzzy c-means (FCM) clustering. We used twelve different input parameters to model fire risk in Ilam Province. Our results with minimum support and minimum confidence show strong relationships between wildfire occurrence and eight variables (distance from settlement, population density, distance from road, slope, standing dead oak trees, temperature, land cover and distance from farm land). In this study, we defined three clusters for each variable: low, middle and high. The data regarding the factors affecting forest fire risk were distributed in these three clusters with different degrees of membership and the final map of all factors was classified by FCM clustering. Each layer was then created in a geographic information system. Finally, wildfire risks in the area obtained from overlaying these layers were classified into five categories, from very low to very high according to the degree of danger.

Download Full-text