scholarly journals Estimating Classification Errors Under Edit Restrictions in Composite Survey-Register Data Using Multiple Imputation Latent Class Modelling (MILC)

2017 ◽  
Vol 33 (4) ◽  
pp. 921-962 ◽  
Author(s):  
Laura Boeschoten ◽  
Daniel Oberski ◽  
Ton de Waal

Abstract Both registers and surveys can contain classification errors. These errors can be estimated by making use of a composite data set. We propose a new method based on latent class modelling to estimate the number of classification errors across several sources while taking into account impossible combinations with scores on other variables. Furthermore, the latent class model, by multiply imputing a new variable, enhances the quality of statistics based on the composite data set. The performance of this method is investigated by a simulation study, which shows that whether or not the method can be applied depends on the entropy R2 of the latent class model and the type of analysis a researcher is planning to do. Finally, the method is applied to public data from Statistics Netherlands.

2016 ◽  
Vol 118 (2) ◽  
pp. 343-361 ◽  
Author(s):  
Eline Poelmans ◽  
Sandra Rousseau

Purpose – The purpose of this paper is to investigate how chocolate lovers balance taste and ethical considerations when selecting chocolate products. Design/methodology/approach – The data set was collected through a survey at the 2014 “Salon du Chocolat” in Brussels, Belgium. The authors distributed 700 copies and received 456 complete responses (65 percent response rate). Choice experiments were used to estimate the relative importance of different chocolate characteristics and to predict respondents’ willingness to pay for marginal changes in those characteristics. The authors estimate both a conditional logit model and a latent class model to take possible preference heterogeneity into account. Findings – On average, respondents were willing to pay 11 euros more for 250 g fairtrade labeled chocolate compared to conventional chocolate. However, taste clearly dominates ethical considerations. The authors could distinguish three consumer segments, each with a different tradeoff between taste and fairtrade. One group clearly valued fairtrade positively, a second group valued fairtrade to a lesser extent and a third group did not seem to value fairtrade. Originality/value – Chocolate can be seen as a self-indulgent treat where taste is likely to dominate other characteristics. Therefore it is unsure to what extent ethical factors are included in consumer decisions. Interestingly the results indicate that a significant share of chocolate buyers still positively value fairtrade characteristics when selecting chocolate varieties.


Methodology ◽  
2018 ◽  
Vol 14 (2) ◽  
pp. 56-68 ◽  
Author(s):  
Davide Vidotto ◽  
Jeroen K. Vermunt ◽  
Katrijn Van Deun

Abstract. Latent class analysis has been recently proposed for the multiple imputation (MI) of missing categorical data, using either a standard frequentist approach or a nonparametric Bayesian model called Dirichlet process mixture of multinomial distributions (DPMM). The main advantage of using a latent class model for multiple imputation is that it is very flexible in the sense that it can capture complex relationships in the data given that the number of latent classes is large enough. However, the two existing approaches also have certain disadvantages. The frequentist approach is computationally demanding because it requires estimating many LC models: first models with different number of classes should be estimated to determine the required number of classes and subsequently the selected model is reestimated for multiple bootstrap samples to take into account parameter uncertainty during the imputation stage. Whereas the Bayesian Dirichlet process models perform the model selection and the handling of the parameter uncertainty automatically, the disadvantage of this method is that it tends to use a too small number of clusters during the Gibbs sampling, leading to an underfitting model yielding invalid imputations. In this paper, we propose an alternative approach which combined the strengths of the two existing approaches; that is, we use the Bayesian standard latent class model as an imputation model. We show how model selection can be performed prior to the imputation step using a single run of the Gibbs sampler and, moreover, show how underfitting is prevented by using large values for the hyperparameters of the mixture weights. The results of two simulation studies and one real-data study indicate that with a proper setting of the prior distributions, the Bayesian latent class model yields valid imputations and outperforms competing methods.


2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Lian Lian ◽  
Shuo Zhang ◽  
Zhong Wang ◽  
Kai Liu ◽  
Lihuan Cao

As the parcel delivery service is booming in China, the competition among express companies intensifies. This paper employed multinomial logit model (MNL) and latent class model (LCM) to investigate customers’ express service choice behavior, using data from a SP survey. The attributes and attribute levels that matter most to express customers are identified. Meanwhile, the customers are divided into two segments (penny pincher segment and high-end segment) characterized by their taste heterogeneity. The results indicate that the LCM performs statistically better than MNL in our sample. Therefore, more attention should be paid to the taste heterogeneity, especially for further academic and policy research in freight choice behavior.


2017 ◽  
Vol 78 (6) ◽  
pp. 925-951 ◽  
Author(s):  
Unkyung No ◽  
Sehee Hong

The purpose of the present study is to compare performances of mixture modeling approaches (i.e., one-step approach, three-step maximum-likelihood approach, three-step BCH approach, and LTB approach) based on diverse sample size conditions. To carry out this research, two simulation studies were conducted with two different models, a latent class model with three predictor variables and a latent class model with one distal outcome variable. For the simulation, data were generated under the conditions of different sample sizes (100, 200, 300, 500, 1,000), entropy (0.6, 0.7, 0.8, 0.9), and the variance of a distal outcome (homoscedasticity, heteroscedasticity). For evaluation criteria, parameter estimates bias, standard error bias, mean squared error, and coverage were used. Results demonstrate that the three-step approaches produced more stable and better estimations than the other approaches even with a small sample size of 100. This research differs from previous studies in the sense that various models were used to compare the approaches and smaller sample size conditions were used. Furthermore, the results supporting the superiority of the three-step approaches even in poorly manipulated conditions indicate the advantage of these approaches.


Sign in / Sign up

Export Citation Format

Share Document