scholarly journals Identifying patterns of item missing survey data using latent groups: an observational study

BMJ Open ◽  
2017 ◽  
Vol 7 (10) ◽  
pp. e017284 ◽  
Author(s):  
Adrian G Barnett ◽  
Paul McElwee ◽  
Andrea Nathan ◽  
Nicola W Burton ◽  
Gavin Turrell

ObjectivesTo examine whether respondents to a survey of health and physical activity and potential determinants could be grouped according to the questions they missed, known as ‘item missing’.DesignObservational study of longitudinal data.SettingResidents of Brisbane, Australia.Participants6901 people aged 40–65 years in 2007.Materials and methodsWe used a latent class model with a mixture of multinomial distributions and chose the number of classes using the Bayesian information criterion. We used logistic regression to examine if participants’ characteristics were associated with their modal latent class. We used logistic regression to examine whether the amount of item missing in a survey predicted wave missing in the following survey.ResultsFour per cent of participants missed almost one-fifth of the questions, and this group missed more questions in the middle of the survey. Eighty-three per cent of participants completed almost every question, but had a relatively high missing probability for a question on sleep time, a question which had an inconsistent presentation compared with the rest of the survey. Participants who completed almost every question were generally younger and more educated. Participants who completed more questions were less likely to miss the next longitudinal wave.ConclusionsExamining patterns in item missing data has improved our understanding of how missing data were generated and has informed future survey design to help reduce missing data.

Author(s):  
MIIN-SHEN YANG ◽  
HWEI-MING CHEN

Distribution mixtures are used as models to analyze grouped data. The estimation of parameters is an important step for mixture distributions. The latent class model is generally used as the analysis of mixture distributions for discrete data. In this paper, we consider the parameter estimation for a mixture of logistic regression models. We know that the expectation maximization (EM) algorithm was most used for estimating the parameters of logistic regression mixture models. In this paper, we propose a new type of fuzzy class model and then derive an algorithm for the parameter estimation of a fuzzy class logistic regression model. The effects of the explanatory variables on the response variables are described. The focus is on binary responses for the logistic regression mixture analysis with a fuzzy class model. An algorithm, called a fuzzy classification maximum likelihood (FCML), is then created. The mean squared error (MSE) based accuracy criterion for the FCML and EM algorithms to the parameter estimation of logistic regression mixture models are compared using the samples drawn from logistic regression mixtures of two classes. Numerical results show that the proposed FCML algorithm presents good accuracy and is recommended as a new tool for the parameter estimation of the logistic regression mixture models.


Methodology ◽  
2018 ◽  
Vol 14 (2) ◽  
pp. 56-68 ◽  
Author(s):  
Davide Vidotto ◽  
Jeroen K. Vermunt ◽  
Katrijn Van Deun

Abstract. Latent class analysis has been recently proposed for the multiple imputation (MI) of missing categorical data, using either a standard frequentist approach or a nonparametric Bayesian model called Dirichlet process mixture of multinomial distributions (DPMM). The main advantage of using a latent class model for multiple imputation is that it is very flexible in the sense that it can capture complex relationships in the data given that the number of latent classes is large enough. However, the two existing approaches also have certain disadvantages. The frequentist approach is computationally demanding because it requires estimating many LC models: first models with different number of classes should be estimated to determine the required number of classes and subsequently the selected model is reestimated for multiple bootstrap samples to take into account parameter uncertainty during the imputation stage. Whereas the Bayesian Dirichlet process models perform the model selection and the handling of the parameter uncertainty automatically, the disadvantage of this method is that it tends to use a too small number of clusters during the Gibbs sampling, leading to an underfitting model yielding invalid imputations. In this paper, we propose an alternative approach which combined the strengths of the two existing approaches; that is, we use the Bayesian standard latent class model as an imputation model. We show how model selection can be performed prior to the imputation step using a single run of the Gibbs sampler and, moreover, show how underfitting is prevented by using large values for the hyperparameters of the mixture weights. The results of two simulation studies and one real-data study indicate that with a proper setting of the prior distributions, the Bayesian latent class model yields valid imputations and outperforms competing methods.


2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Lian Lian ◽  
Shuo Zhang ◽  
Zhong Wang ◽  
Kai Liu ◽  
Lihuan Cao

As the parcel delivery service is booming in China, the competition among express companies intensifies. This paper employed multinomial logit model (MNL) and latent class model (LCM) to investigate customers’ express service choice behavior, using data from a SP survey. The attributes and attribute levels that matter most to express customers are identified. Meanwhile, the customers are divided into two segments (penny pincher segment and high-end segment) characterized by their taste heterogeneity. The results indicate that the LCM performs statistically better than MNL in our sample. Therefore, more attention should be paid to the taste heterogeneity, especially for further academic and policy research in freight choice behavior.


2017 ◽  
Vol 78 (6) ◽  
pp. 925-951 ◽  
Author(s):  
Unkyung No ◽  
Sehee Hong

The purpose of the present study is to compare performances of mixture modeling approaches (i.e., one-step approach, three-step maximum-likelihood approach, three-step BCH approach, and LTB approach) based on diverse sample size conditions. To carry out this research, two simulation studies were conducted with two different models, a latent class model with three predictor variables and a latent class model with one distal outcome variable. For the simulation, data were generated under the conditions of different sample sizes (100, 200, 300, 500, 1,000), entropy (0.6, 0.7, 0.8, 0.9), and the variance of a distal outcome (homoscedasticity, heteroscedasticity). For evaluation criteria, parameter estimates bias, standard error bias, mean squared error, and coverage were used. Results demonstrate that the three-step approaches produced more stable and better estimations than the other approaches even with a small sample size of 100. This research differs from previous studies in the sense that various models were used to compare the approaches and smaller sample size conditions were used. Furthermore, the results supporting the superiority of the three-step approaches even in poorly manipulated conditions indicate the advantage of these approaches.


Sign in / Sign up

Export Citation Format

Share Document