scholarly journals Identifying Outliers in Response Quality Assessment by Using Multivariate Control Charts Based on Kernel Density Estimation

2021 ◽  
Vol 37 (1) ◽  
pp. 97-119
Author(s):  
Jiayun Jin ◽  
Geert Loosveldt

Abstract When monitoring industrial processes, a Statistical Process Control tool, such as a multivariate Hotelling T 2 chart is frequently used to evaluate multiple quality characteristics. However, research into the use of T 2 charts for survey fieldwork–essentially a production process in which data sets collected by means of interviews are produced–has been scant to date. In this study, using data from the eighth round of the European Social Survey in Belgium, we present a procedure for simultaneously monitoring six response quality indicators and identifying outliers: interviews with anomalous results. The procedure integrates Kernel Density Estimation (KDE) with a T 2 chart, so that historical “in-control” data or reference to the assumption of a parametric distribution of the indicators is not required. In total, 75 outliers (4.25%) are iteratively removed, resulting in an in-control data set containing 1,691 interviews. The outliers are mainly characterized by having longer sequences of identical answers, a greater number of extreme answers, and against expectation, a lower item nonresponse rate. The procedure is validated by means of ten-fold cross-validation and comparison with the minimum covariance determinant algorithm as the criterion. By providing a method of obtaining in-control data, the present findings go some way toward a way to monitor response quality, identify problems, and provide rapid feedbacks during survey fieldwork.

2019 ◽  
Vol 6 (1) ◽  
pp. 1665949
Author(s):  
Muhammad Mashuri ◽  
Haryono Haryono ◽  
Diaz Fitra Aksioma ◽  
Wibawati Wibawati ◽  
Muhammad Ahsan ◽  
...  

2011 ◽  
Vol 5 (2) ◽  
pp. 181-193 ◽  
Author(s):  
Qing Liu ◽  
David Pitt ◽  
Xibin Zhang ◽  
Xueyuan Wu

AbstractIn this paper, we present a Markov chain Monte Carlo (MCMC) simulation algorithm for estimating parameters in the kernel density estimation of bivariate insurance claim data via transformations. Our data set consists of two types of auto insurance claim costs and exhibits a high-level of skewness in the marginal empirical distributions. Therefore, the kernel density estimator based on original data does not perform well. However, the density of the original data can be estimated through estimating the density of the transformed data using kernels. It is well known that the performance of a kernel density estimator is mainly determined by the bandwidth, and only in a minor way by the kernel. In the current literature, there have been some developments in the area of estimating densities based on transformed data, where bandwidth selection usually depends on pre-determined transformation parameters. Moreover, in the bivariate situation, the transformation parameters were estimated for each dimension individually. We use a Bayesian sampling algorithm and present a Metropolis-Hastings sampling procedure to sample the bandwidth and transformation parameters from their posterior density. Our contribution is to estimate the bandwidths and transformation parameters simultaneously within a Metropolis-Hastings sampling procedure. Moreover, we demonstrate that the correlation between the two dimensions is better captured through the bivariate density estimator based on transformed data.


2010 ◽  
Vol 20-23 ◽  
pp. 389-394
Author(s):  
Zhi Feng Hao ◽  
Rui Chu Cai ◽  
Tang Wu ◽  
Yi Yuan Zhou

Association rules provide a concise statement of potentially useful information, and have been widely used in real applications. However, the usefulness of association rules highly depends on the interestingness measure which is used to select interesting rules from millions of candidates. In this study, a probability analysis of association rules is conducted, and a discrete kernel density estimation based interestingness measure is proposed accordingly. The new proposed interestingness measure makes the most of the information contained in the data set and obtains much lower falsely discovery rate than the existing interestingness measures. Experimental results show the effectiveness of the proposed interestingness measure.


2021 ◽  
Vol 27 (1) ◽  
pp. 57-69
Author(s):  
Yasmina Ziane ◽  
Nabil Zougab ◽  
Smail Adjabi

Abstract In this paper, we consider the procedure for deriving variable bandwidth in univariate kernel density estimation for nonnegative heavy-tailed (HT) data. These procedures consider the Birnbaum–Saunders power-exponential (BS-PE) kernel estimator and the bayesian approach that treats the adaptive bandwidths. We adapt an algorithm that subdivides the HT data set into two regions, high density region (HDR) and low-density region (LDR), and we assign a bandwidth parameter for each region. They are derived by using a Monte Carlo Markov chain (MCMC) sampling algorithm. A series of simulation studies and real data are realized for evaluating the performance of a procedure proposed.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Wenzhong Shi ◽  
Chengzhuo Tong ◽  
Anshu Zhang ◽  
Bin Wang ◽  
Zhicheng Shi ◽  
...  

A Correction to this paper has been published: https://doi.org/10.1038/s42003-021-01924-6


Sign in / Sign up

Export Citation Format

Share Document