Relating Latent Class Assignments to External Variables: Standard Errors for Correct Inference

2014 ◽  
Vol 22 (4) ◽  
pp. 520-540 ◽  
Author(s):  
Zsuzsa Bakk ◽  
Daniel L. Oberski ◽  
Jeroen K. Vermunt

Latent class analysis is used in the political science literature in both substantive applications and as a tool to estimate measurement error. Many studies in the social and political sciences relate estimated class assignments from a latent class model to external variables. Although common, such a “three-step” procedure effectively ignores classification error in the class assignments; Vermunt (2010, “Latent class modeling with covariates: Two improved three-step approaches,” Political Analysis 18:450–69) showed that this leads to inconsistent parameter estimates and proposed a correction. Although this correction for bias is now implemented in standard software, inconsistency is not the only consequence of classification error. We demonstrate that the correction method introduces an additional source of variance in the estimates, so that standard errors and confidence intervals are overly optimistic when not taking this into account. We derive the asymptotic variance of the third-step estimates of interest, as well as several candidate-corrected sample estimators of the standard errors. These corrected standard error estimators are evaluated using a Monte Carlo study, and we provide practical advice to researchers as to which should be used so that valid inferences can be obtained when relating estimated class membership to external variables.

2017 ◽  
Vol 78 (6) ◽  
pp. 925-951 ◽  
Author(s):  
Unkyung No ◽  
Sehee Hong

The purpose of the present study is to compare performances of mixture modeling approaches (i.e., one-step approach, three-step maximum-likelihood approach, three-step BCH approach, and LTB approach) based on diverse sample size conditions. To carry out this research, two simulation studies were conducted with two different models, a latent class model with three predictor variables and a latent class model with one distal outcome variable. For the simulation, data were generated under the conditions of different sample sizes (100, 200, 300, 500, 1,000), entropy (0.6, 0.7, 0.8, 0.9), and the variance of a distal outcome (homoscedasticity, heteroscedasticity). For evaluation criteria, parameter estimates bias, standard error bias, mean squared error, and coverage were used. Results demonstrate that the three-step approaches produced more stable and better estimations than the other approaches even with a small sample size of 100. This research differs from previous studies in the sense that various models were used to compare the approaches and smaller sample size conditions were used. Furthermore, the results supporting the superiority of the three-step approaches even in poorly manipulated conditions indicate the advantage of these approaches.


2010 ◽  
Vol 18 (4) ◽  
pp. 450-469 ◽  
Author(s):  
Jeroen K. Vermunt

Researchers using latent class (LC) analysis often proceed using the following three steps: (1) an LC model is built for a set of response variables, (2) subjects are assigned to LCs based on their posterior class membership probabilities, and (3) the association between the assigned class membership and external variables is investigated using simple cross-tabulations or multinomial logistic regression analysis. Bolck, Croon, and Hagenaars (2004) demonstrated that such a three-step approach underestimates the associations between covariates and class membership. They proposed resolving this problem by means of a specific correction method that involves modifying the third step. In this article, I extend the correction method of Bolck, Croon, and Hagenaars by showing that it involves maximizing a weighted log-likelihood function for clustered data. This conceptualization makes it possible to apply the method not only with categorical but also with continuous explanatory variables, to obtain correct tests using complex sampling variance estimation methods, and to implement it in standard software for logistic regression analysis. In addition, a new maximum likelihood (ML)—based correction method is proposed, which is more direct in the sense that it does not require analyzing weighted data. This new three-step ML method can be easily implemented in software for LC analysis. The reported simulation study shows that both correction methods perform very well in the sense that their parameter estimates and their SEs can be trusted, except for situations with very poorly separated classes. The main advantage of the ML method compared with the Bolck, Croon, and Hagenaars approach is that it is much more efficient and almost as efficient as one-step ML estimation.


1987 ◽  
Vol 24 (3) ◽  
pp. 298-304
Author(s):  
Rajiv Grover

Only recently have latent class models been used effectively to analyze marketing data, though they have been popular for more than a decade in the social sciences. Most research reported in the literture does not include the standard errors of the estimates of the latent class model parameters. The author argues for the usefulness of standard errors while exploring for parsimonious models. He provides an approach to estimating standard errors of all parameters as estimated by the iterative proportional fitting algorithm of Goodman implemented in MLLSA.


2004 ◽  
Vol 12 (1) ◽  
pp. 3-27 ◽  
Author(s):  
Annabel Bolck ◽  
Marcel Croon ◽  
Jacques Hagenaars

We study the properties of a three-step approach to estimating the parameters of a latent structure model for categorical data and propose a simple correction for a common source of bias. Such models have a measurement part (essentially the latent class model) and a structural (causal) part (essentially a system of logit equations). In the three-step approach, a stand-alone measurement model is first defined and its parameters are estimated. Individual predicted scores on the latent variables are then computed from the parameter estimates of the measurement model and the individual observed scoring patterns on the indicators. Finally, these predicted scores are used in the causal part and treated as observed variables. We show that such a naive use of predicted latent scores cannot be recommended since it leads to a systematic underestimation of the strength of the association among the variables in the structural part of the models. However, a simple correction procedure can eliminate this systematic bias. This approach is illustrated on simulated and real data. A method that uses multiple imputation to account for the fact that the predicted latent variables are random variables can produce standard errors for the parameters in the structural part of the model.


Computation ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 44
Author(s):  
Mahdi Rezapour ◽  
Khaled Ksaibati

The choice of not buckling a seat belt has resulted in a high number of deaths worldwide. Although extensive studies have been done to identify factors of seat belt use, most of those studies have ignored the presence of heterogeneity across vehicle occupants. Not accounting for heterogeneity might result in a bias in model outputs. One of the main approaches to capture random heterogeneity is the employment of the latent class (LC) model by means of a discrete distribution. In a standard LC model, the heterogeneity across observations is considered while assuming the homogeneous utility maximization for decision rules. However, that notion ignores the heterogeneity in the decision rule across individual drivers. In other words, while some drivers make a choice of buckling up with some characteristics, others might ignore those factors while making a choice. Those differences could be accommodated for by allowing class allocation to vary based on various socio-economic characteristics and by constraining some of those rules at zeroes across some of the classes. Thus, in this study, in addition to accounting for heterogeneity across individual drivers, we accounted for heterogeneity in the decision rule by varying the parameters for class allocation. Our results showed that the assignment of various observations to classes is a function of factors such as vehicle type, roadway classification, and vehicle license registration. Additionally, the results showed that a minor consideration of the heterogeneous decision rule resulted in a minor gain in model fits, as well as changes in significance and magnitude of the parameter estimates. All of this was despite the challenges of fully identifying exact attributes for class allocation due to the inclusion of high number of attributes. The findings of this study have important implications for the use of an LC model to account for not only the taste heterogeneity but also heterogeneity across the decision rule to enhance model fit and to expand our understanding about the unbiased point estimates of parameters.


Author(s):  
Russell Cheng

This book relies on maximum likelihood (ML) estimation of parameters. Asymptotic theory assumes regularity conditions hold when the ML estimator is consistent. Typically an additional third derivative condition is assumed to ensure that the ML estimator is also asymptotically normally distributed. Standard asymptotic results that then hold are summarized in this chapter; for example, the asymptotic variance of the ML estimator is then given by the Fisher information formula, and the log-likelihood ratio, the Wald and the score statistics for testing the statistical significance of parameter estimates are all asymptotically equivalent. Also, the useful profile log-likelihood then behaves exactly as a standard log-likelihood only in a parameter space of just one dimension. Further, the model can be reparametrized to make it locally orthogonal in the neighbourhood of the true parameter value. The large exponential family of models is briefly reviewed where a unified set of regular conditions can be obtained.


Sign in / Sign up

Export Citation Format

Share Document