A spatial filtering inspired three-way clustering approach with application to outlier detection

AbstractBackgroundIdentifying implausible clinical observations (e.g., laboratory test and vital sign values) in Electronic Health Record (EHR) data using rule-based procedures is challenging. Anomaly/outlier detection methods can be applied as an alternative algorithmic approach to flagging such implausible values in EHRs.ObjectiveThe primary objectives of this research were to develop and test an unsupervised clustering-based anomaly/outlier detection approach for detecting implausible observations in EHR data as an alternative algorithmic solution to the existing procedures.MethodsOur approach is built upon two underlying hypotheses that, (i) when there are large number of observations, implausible records should be sparse, and therefore (ii) if these data are clustered properly, clusters with sparse populations should represent implausible observations. To test these hypotheses, we applied an unsupervised clustering algorithm to EHR observation data on 50 laboratory tests. We tested different specifications of the clustering approach and computed confusion matrix indices against a set of silver-standard plausibility thresholds. We compared the results from the proposed approach with conventional anomaly detection (CAD) approach’s, including standard deviation and Mahalanobis distance.ResultsWe found that the clustering approach produced results with exceptional specificity and high sensitivity. Compared with the conventional anomaly detection approaches, our proposed clustering approach resulted in significantly smaller number of false positive cases.ConclusionOur contributions include (i) a clustering approach for identifying implausible EHR observations, (ii) evidence that implausible observations are sparse in EHR laboratory test results, (iii) a parallel implementation of the clustering approach on i2b2 star schema, and (3) a set of silver-standard plausibility thresholds for 50 laboratory tests that can be used in other studies for validation. The proposed algorithmic solution can augment human decisions to improve data quality. Therefore, a workflow is needed to complement the algorithm’s job and initiate necessary actions that need to be taken in order to improve the quality of data.

Download Full-text

A Novel Density-Based Clustering Approach for Outlier Detection in High-Dimensional Data

Lecture Notes in Computer Science - Hybrid Artificial Intelligent Systems ◽

10.1007/978-3-030-29859-3_28 ◽

2019 ◽

pp. 322-331 ◽

Cited By ~ 1

Author(s):

Thouraya Aouled Messaoud ◽

Abir Smiti ◽

Aymen Louati

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

High Dimensional ◽

Density Based Clustering ◽

Clustering Approach

Download Full-text

Support Vector Clustering for Outlier Detection

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.493 ◽

2013 ◽

Vol 756-759 ◽

pp. 493-496 ◽

Cited By ~ 2

Author(s):

Hai Lei Wang ◽

Wen Bo Li ◽

Bing Yu Sun

Keyword(s):

Outlier Detection ◽

Large Scale ◽

Detection Methods ◽

Support Vector ◽

Support Vector Clustering ◽

Detection Algorithms ◽

Clustering Approach ◽

Vector Clustering ◽

Modeling Data ◽

Selection Of

In this paper a novel Support vector clustering (SVC) method for outlier detection is proposed. Outlier detection algorithms have application in several tasks such as data mining, data preprocessing, data filter-cleaner, time series analysis and so on. Traditionally outlier detection methods are mostly based on modeling data based on its statistical properties and these approaches are only preferred when large scale set is available. To solve this problem, in this paper we focus on establishing the context of support vector clustering approach for outlier detection. Compared to traditional outlier detection methods , the performance of the SVC is not sensitive to the selection of needed parameters. The experiment results proved the efficiency of our method.

Download Full-text