scholarly journals Mean-shift outlier detection and filtering

2021 ◽  
Vol 115 ◽  
pp. 107874
Author(s):  
Jiawei Yang ◽  
Susanto Rahardja ◽  
Pasi Fränti
Keyword(s):  
Author(s):  
Siriwan Phongsasiri ◽  
Suwanna Rasmequan

In this paper, the Probabilistic Mapped Mean-Shift Algorithm is proposed to detect anomalous data in public datasets and local hospital children’s wellness clinic databases. The proposed framework consists of two main parts. First, the Probabilistic Mapping step consists of k-NN instance acquisition, data distribution calculation, and data point reposition.  Truncated Gaussian Distribution (TGD) was used for controlling the boundary of the mapped points. Second, the Outlier Detection step consists of outlier score calculation and outlier selection.  Experimental results show that the proposed algorithm outperformed the existing algorithms with real-world benchmark datasets and  a Children’s Wellness Clinic dataset (CWD). Outlier detection accuracy obtained from the proposed algorithm based on Wellness, Stamps, Arrhythmia, Pima, and Parkinson datasets was 93%, 94%, 80%, 75%, and 72%, respectively.


Author(s):  
Zhuang Qi ◽  
Dazhi Jiang ◽  
Xiaming Chen

In linear regression, outliers have a serious effect on the estimation of regression model parameters and the prediction of final results, so outlier detection is one of the key steps in data analysis. In this paper, we use a mean shift model and then we apply the penalty function to penalize the mean shift parameters, which is conducive to get a sparse parameter vector. We choose Sorted L1 regularization (SLOPE), which provides a convex loss function, and shows good statistical properties in parameter selection. We apply an iterative process which using gradient descent method and parameter selection at each step. Our algorithm has higher computational efficiency since the calculation of inverse matrix is avoided. Finally, we use Cross-Validation rules (CV) and Bayesian Information Criterion (BIC) criteria to fine tune the parameters, which helps our program identify outliers and obtain more robust regression coefficients. Compared with other methods, the experimental results show that our program has a fantastic performance in all aspects of outlier detection.


Mathematics ◽  
2020 ◽  
Vol 8 (6) ◽  
pp. 991 ◽  
Author(s):  
Rüdiger Lehmann ◽  
Michael Lösler ◽  
Frank Neitzel

Outlier detection is one of the most important tasks in the analysis of measured quantities to ensure reliable results. In recent years, a variety of multi-sensor platforms has become available, which allow autonomous and continuous acquisition of large quantities of heterogeneous observations. Because the probability that such data sets contain outliers increases with the quantity of measured values, powerful methods are required to identify contaminated observations. In geodesy, the mean shift model (MS) is one of the most commonly used approaches for outlier detection. In addition to the MS model, there is an alternative approach with the model of variance inflation (VI). In this investigation the VI approach is derived in detail, truly maximizing the likelihood functions and examined for outlier detection of one or multiple outliers. In general, the variance inflation approach is non-linear, even if the null model is linear. Thus, an analytical solution does usually not exist, except in the case of repeated measurements. The test statistic is derived from the likelihood ratio (LR) of the models. The VI approach is compared with the MS model in terms of statistical power, identifiability of actual outliers, and numerical effort. The main purpose of this paper is to examine the performance of both approaches in order to derive recommendations for the practical application of outlier detection.


Biometrika ◽  
2017 ◽  
Vol 104 (3) ◽  
pp. 633-647 ◽  
Author(s):  
Y. She ◽  
K. Chen

Summary In high-dimensional multivariate regression problems, enforcing low rank in the coefficient matrix offers effective dimension reduction, which greatly facilitates parameter estimation and model interpretation. However, commonly used reduced-rank methods are sensitive to data corruption, as the low-rank dependence structure between response variables and predictors is easily distorted by outliers. We propose a robust reduced-rank regression approach for joint modelling and outlier detection. The problem is formulated as a regularized multivariate regression with a sparse mean-shift parameterization, which generalizes and unifies some popular robust multivariate methods. An efficient thresholding-based iterative procedure is developed for optimization. We show that the algorithm is guaranteed to converge and that the coordinatewise minimum point produced is statistically accurate under regularity conditions. Our theoretical investigations focus on non-asymptotic robust analysis, demonstrating that joint rank reduction and outlier detection leads to improved prediction accuracy. In particular, we show that redescending ψ-functions can essentially attain the minimax optimal error rate, and in some less challenging problems convex regularization guarantees the same low error rate. The performance of the proposed method is examined through simulation studies and real-data examples.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Crispin M. Mutshinda ◽  
Andrew J. Irwin ◽  
Mikko J. Sillanpää

AbstractWe introduce a Bayesian framework for simultaneous feature selection and outlier detection in sparse high-dimensional regression models, with a focus on quantitative trait locus (QTL) mapping in experimental crosses. More specifically, we incorporate the robust mean shift outlier handling mechanism into the multiple QTL mapping regression model and apply LASSO regularization concurrently to the genetic effects and the mean-shift terms through the flexible extended Bayesian LASSO (EBL) prior structure, thereby combining QTL mapping and outlier detection into a single sparse model representation problem. The EBL priors on the mean-shift terms prevent outlying phenotypic values from distorting the genotype-phenotype association and allow their detection as cases with outstanding mean shift values following the LASSO shrinkage. Simulation results demonstrate the effectiveness of our new methodology at mapping QTLs in the presence of outlying phenotypic values and simultaneously identifying the potential outliers, while maintaining a comparable performance to the standard EBL on outlier-free data.


Author(s):  
Rajendra Kumar Dwivedi ◽  
Rakesh Kumar ◽  
Rajkumar Buyya

A smart healthcare sensor cloud is an amalgamation of the body sensor networks and the cloud that facilitates the early diagnosis of diseases and the real-time monitoring of patients. Sensitive data of the patients which are stored in the cloud must be free from outliers that may be caused by malfunctioned hardware or the intruders. This paper presents a machine learning-based scheme for outlier detection in smart healthcare sensor clouds. The proposed scheme is a hybrid of clustering and classification techniques in which a two-level framework is devised to identify the outliers precisely. At the first level, a density-based scheme is used for clustering while at the second level, a Gaussian distribution-based approach is used for classification. This scheme is implemented in Python and compared with a clustering-based approach (Mean Shift) and a classification-based approach (Support Vector Machine) on two different standard datasets. The proposed scheme is evaluated on various performance metrics. Results demonstrate the superiority of the proposed scheme over the existing ones.


2012 ◽  
Vol 2 (3) ◽  
pp. 98-101 ◽  
Author(s):  
E.Sateesh E.Sateesh ◽  
◽  
M.L.Prasanthi M.L.Prasanthi

Sign in / Sign up

Export Citation Format

Share Document