Outlier Detection: A Research and Modified Method using Fuzzy Clustering

Data mining is becoming increasingly popular in many application fields. Due to the advancement, Researchers show great interest to find unexpected behaviour over large amount of datasets. Outlier detection is studied extensively in data mining and developed for certain application domains, while others are generic in nature. It is one of the important and hottest topic in research which faces a series of new challenges. It occurs due to change in system behaviour, mechanical fault, human error, natural deviations and instrumental error. The purpose of this paper briefly provides a survey on outlier detection and a modified approach to detect outlier using Fuzzy clustering. Also, it provides a better understanding of different dimensions that applied in various substantive areas.

Download Full-text

MONITORING OF OPERATIONAL LOGISTIC PROCESSES IN GENERAL CARGO WAREHOUSES USING PREDICTIVE ANALYSIS

Advanced Logistic Systems - Theory and Practice ◽

10.32971/als.2019.009 ◽

2019 ◽

Vol 13 (1) ◽

pp. 27-36

Author(s):

Andreas Neubert

Keyword(s):

Data Mining ◽

Human Error ◽

Predictive Analysis ◽

Data Mining Algorithm ◽

It Infrastructure ◽

Monitoring Procedure ◽

Manual Recording ◽

Mining Tool ◽

General Cargo ◽

Different Characteristics

Due to the different characteristics of the piece goods (e.g. size and weight), they are transported in general cargo warehouses by manually-operated industrial trucks such as forklifts and pallet trucks. Since manual activities are susceptible to possible human error, errors occur in logistical processes in general cargo warehouses. This leads to incorrect loading, stacking and damage to storage equipment and general cargo. It would be possible to reduce costs arising from errors in logistical processes if these errors could be remedied in advance. This paper presents a monitoring procedure for logistical processes in manually-operated general cargo warehouses. This is where predictive analysis is applied. Seven steps are introduced with a view to integrating predictive analysis into the IT infrastructure of general cargo warehouses. These steps are described in detail. The CRISP4BigData model, the SVM data mining algorithm, the data mining tool R, the programming language C++ for the scoring in general cargo warehouses represent the results of this paper. After having created the system and installed it in general cargo warehouses, initial results obtained with this method over a certain time span will be compared with results obtained without this method through manual recording over the same period.

Download Full-text

OFCOD: On the Fly Clustering Based Outlier Detection Framework

Data ◽

10.3390/data6010001 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Ahmed Elmogy ◽

Hamada Rizk ◽

Amany M. Sarhan

Keyword(s):

Data Mining ◽

Image Processing ◽

Intrusion Detection ◽

Real Time ◽

Outlier Detection ◽

Real World ◽

Medical Data ◽

Experimental Results ◽

Real Time Applications ◽

Real World Datasets

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

Download Full-text

A Novel Density-based Technique for Outlier Detection of High Dimensional Data Utilizing Full Feature Space

Information Technology And Control ◽

10.5755/j01.itc.50.1.25588 ◽

2021 ◽

Vol 50 (1) ◽

pp. 138-152

Author(s):

Mujeeb Ur Rehman ◽

Dost Muhammad Khan

Keyword(s):

Data Mining ◽

Outlier Detection ◽

High Dimensional Data ◽

Research Work ◽

Feature Space ◽

High Dimensional ◽

Data Set ◽

Data Points ◽

Low Dimensional ◽

Intrinsic Feature

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.

Download Full-text

Combined data mining techniques based patient data outlier detection for healthcare safety

International Journal of Intelligent Computing and Cybernetics ◽

10.1108/ijicc-07-2015-0024 ◽

2016 ◽

Vol 9 (1) ◽

pp. 42-68 ◽

Cited By ~ 10

Author(s):

Gebeyehu Belay Gebremeskel ◽

Chai Yi ◽

Zhongshi He ◽

Dawit Haile

Keyword(s):

Data Mining ◽

Decision Making ◽

Patient Safety ◽

Outlier Detection ◽

Clinical Data ◽

Clinical Decision Making ◽

Clinical Decision ◽

Healthcare Services ◽

Content Type ◽

Outliers Detection

Purpose – Among the growing number of data mining (DM) techniques, outlier detection has gained importance in many applications and also attracted much attention in recent times. In the past, outlier detection researched papers appeared in a safety care that can view as searching for the needles in the haystack. However, outliers are not always erroneous. Therefore, the purpose of this paper is to investigate the role of outliers in healthcare services in general and patient safety care, in particular. Design/methodology/approach – It is a combined DM (clustering and the nearest neighbor) technique for outliers’ detection, which provides a clear understanding and meaningful insights to visualize the data behaviors for healthcare safety. The outcomes or the knowledge implicit is vitally essential to a proper clinical decision-making process. The method is important to the semantic, and the novel tactic of patients’ events and situations prove that play a significant role in the process of patient care safety and medications. Findings – The outcomes of the paper is discussing a novel and integrated methodology, which can be inferring for different biological data analysis. It is discussed as integrated DM techniques to optimize its performance in the field of health and medical science. It is an integrated method of outliers detection that can be extending for searching valuable information and knowledge implicit based on selected patient factors. Based on these facts, outliers are detected as clusters and point events, and novel ideas proposed to empower clinical services in consideration of customers’ satisfactions. It is also essential to be a baseline for further healthcare strategic development and research works. Research limitations/implications – This paper mainly focussed on outliers detections. Outlier isolation that are essential to investigate the reason how it happened and communications how to mitigate it did not touch. Therefore, the research can be extended more about the hierarchy of patient problems. Originality/value – DM is a dynamic and successful gateway for discovering useful knowledge for enhancing healthcare performances and patient safety. Clinical data based outlier detection is a basic task to achieve healthcare strategy. Therefore, in this paper, the authors focussed on combined DM techniques for a deep analysis of clinical data, which provide an optimal level of clinical decision-making processes. Proper clinical decisions can obtain in terms of attributes selections that important to know the influential factors or parameters of healthcare services. Therefore, using integrated clustering and nearest neighbors techniques give more acceptable searched such complex data outliers, which could be fundamental to further analysis of healthcare and patient safety situational analysis.

Download Full-text

Distance Based Pattern Driven Mining for Outlier Detection in High Dimensional Big Dataset

ACM Transactions on Management Information Systems ◽

10.1145/3469891 ◽

2022 ◽

Vol 13 (1) ◽

pp. 1-17

Author(s):

Ankit Kumar ◽

Abhishek Kumar ◽

Ali Kashif Bashir ◽

Mamoon Rashid ◽

V. D. Ambeth Kumar ◽

...

Keyword(s):

Data Mining ◽

Comparative Analysis ◽

Outlier Detection ◽

Credit Card ◽

High Dimensional ◽

Work Efficiency ◽

Average Value ◽

Novel Method ◽

Detection Of Outliers ◽

Better Than

Detection of outliers or anomalies is one of the vital issues in pattern-driven data mining. Outlier detection detects the inconsistent behavior of individual objects. It is an important sector in the data mining field with several different applications such as detecting credit card fraud, hacking discovery and discovering criminal activities. It is necessary to develop tools used to uncover the critical information established in the extensive data. This paper investigated a novel method for detecting cluster outliers in a multidimensional dataset, capable of identifying the clusters and outliers for datasets containing noise. The proposed method can detect the groups and outliers left by the clustering process, like instant irregular sets of clusters (C) and outliers (O), to boost the results. The results obtained after applying the algorithm to the dataset improved in terms of several parameters. For the comparative analysis, the accurate average value and the recall value parameters are computed. The accurate average value is 74.05% of the existing COID algorithm, and our proposed algorithm has 77.21%. The average recall value is 81.19% and 89.51% of the existing and proposed algorithm, which shows that the proposed work efficiency is better than the existing COID algorithm.

Download Full-text

Research Challenge of Locally Computed Ubiquitous Data Mining

Data Mining ◽

10.4018/978-1-4666-2455-9.ch101 ◽

2013 ◽

pp. 1960-1978

Author(s):

Aysegul Cayci ◽

João Bártolo Gomes ◽

Andrea Zanda ◽

Ernestina Menasalvas ◽

Santiago Eibe

Keyword(s):

Data Mining ◽

Situational Factors ◽

Wireless Sensor ◽

Data Mining Algorithm ◽

Future Directions ◽

Wearable Technologies ◽

Mining Algorithm ◽

New Challenges ◽

Research Challenge

Advances in wireless, sensor, mobile and wearable technologies present new challenges for data mining research on providing mobile applications with intelligence. Autonomy and adaptability requirements are the two most important challenges for data mining in this new environment. In this chapter, in order to encourage the researchers on this area, we analyzed the challenges of designing ubiquitous data mining services by examining the issues and problems while paying special attention to context and resource awareness. We focused on the autonomous execution of a data mining algorithm and analyzed the situational factors that influence the quality of the result. Already existing solutions in this area and future directions of research are also covered in this chapter.

Download Full-text

Online Clustering and Outlier Detection

Data Mining ◽

10.4018/978-1-4666-2455-9.ch008 ◽

2013 ◽

pp. 142-158

Author(s):

Baoying Wang ◽

Aijuan Dong

Keyword(s):

Data Mining ◽

Outlier Detection ◽

Future Research ◽

Online Data ◽

Data Set ◽

Online Clustering ◽

Practical Applications ◽

Online Fraud ◽

Mining Areas ◽

Two Phases

Clustering and outlier detection are important data mining areas. Online clustering and outlier detection generally work with continuous data streams generated at a rapid rate and have many practical applications, such as network instruction detection and online fraud detection. This chapter first reviews related background of online clustering and outlier detection. Then, an incremental clustering and outlier detection method for market-basket data is proposed and presented in details. This proposed method consists of two phases: weighted affinity measure clustering (WC clustering) and outlier detection. Specifically, given a data set, the WC clustering phase analyzes the data set and groups data items into clusters. Then, outlier detection phase examines each newly arrived transaction against the item clusters formed in WC clustering phase, and determines whether the new transaction is an outlier. Periodically, the newly collected transactions are analyzed using WC clustering to produce an updated set of clusters, against which transactions arrived afterwards are examined. The process is carried out continuously and incrementally. Finally, the future research trends on online data mining are explored at the end of the chapter.

Download Full-text

Research Challenge of Locally Computed Ubiquitous Data Mining

Handbook of Research on Mobility and Computing ◽

10.4018/978-1-60960-042-6.ch037 ◽

2011 ◽

pp. 576-594

Author(s):

Aysegul Cayci ◽

João Bártolo Gomes ◽

Andrea Zanda ◽

Ernestina Menasalvas ◽

Santiago Eibe

Keyword(s):

Data Mining ◽

Situational Factors ◽

Wireless Sensor ◽

Data Mining Algorithm ◽

Future Directions ◽

Wearable Technologies ◽

Mining Algorithm ◽

New Challenges ◽

Research Challenge

Download Full-text

Outlier Detection Techniques for Data Mining

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch228 ◽

2011 ◽

pp. 1483-1488

Author(s):

Fabrizio Angiulli

Keyword(s):

Data Mining ◽

Outlier Detection ◽

Credit Card ◽

Detection Methods ◽

Distribution Model ◽

Main Task ◽

Data Set ◽

Homogeneous Groups ◽

Definition Of ◽

Dependency Detection

Data mining techniques can be grouped in four main categories: clustering, classification, dependency detection, and outlier detection. Clustering is the process of partitioning a set of objects into homogeneous groups, or clusters. Classification is the task of assigning objects to one of several predefined categories. Dependency detection searches for pairs of attribute sets which exhibit some degree of correlation in the data set at hand. The outlier detection task can be defined as follows: “Given a set of data points or objects, find the objects that are considerably dissimilar, exceptional or inconsistent with respect to the remaining data”. These exceptional objects as also referred to as outliers. Most of the early methods for outlier identification have been developed in the field of statistics (Hawkins, 1980; Barnett & Lewis, 1994). Hawkins’ definition of outlier clarifies the approach: “An outlier is an observation that deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism”. Indeed, statistical techniques assume that the given data set has a distribution model. Outliers are those points that satisfy a discordancy test, that is, that are significantly far from what would be their expected position given the hypothesized distribution. Many clustering, classification and dependency detection methods produce outliers as a by-product of their main task. For example, in classification, mislabeled objects are considered outliers and thus they are removed from the training set to improve the accuracy of the resulting classifier, while in clustering, objects that do not strongly belong to any cluster are considered outliers. Nevertheless, it must be said that searching for outliers through techniques specifically designed for tasks different from outlier detection could not be advantageous. As an example, clusters can be distorted by outliers and, thus, the quality of the outliers returned is affected by their presence. Moreover, other than returning a solution of higher quality, outlier detection algorithms can be vastly more efficient than non ad-hoc algorithms. While in many contexts outliers are considered as noise that must be eliminated, as pointed out elsewhere, “one person’s noise could be another person’s signal”, and thus outliers themselves can be of great interest. Outlier mining is used in telecom or credit card frauds to detect the atypical usage of telecom services or credit cards, in intrusion detection for detecting unauthorized accesses, in medical analysis to test abnormal reactions to new medical therapies, in marketing and customer segmentations to identify customers spending much more or much less than average customer, in surveillance systems, in data cleaning, and in many other fields.

Download Full-text

Improved Thyroid Disease Prediction Model Using Data Mining Techniques with Outlier Detection

Intelligent Systems Reference Library - Advanced Machine Learning Approaches in Cancer Prognosis ◽

10.1007/978-3-030-71975-3_5 ◽

2021 ◽

pp. 129-161

Author(s):

Yasir Iqbal Mir

Keyword(s):

Data Mining ◽

Prediction Model ◽

Outlier Detection ◽

Thyroid Disease ◽

Disease Prediction ◽

Data Mining Techniques ◽

Using Data

Download Full-text