GENERALIZED LATENT CLASS ANALYSIS BASED ON MODEL DOMINANCE THEORY
Latent class analysis is a popular statistical learning approach. A major challenge for learning generalized latent class is the complexity in searching the huge space of models and parameters. The computational cost is higher when the model topology is more flexible. In this paper, we propose the notion of dominance which can lead to strong pruning of the search space and significant reduction of learning complexity, and apply this notion to the Generalized Latent Class (GLC) models, a class of Bayesian networks for clustering categorical data. GLC models can address the local dependence problem in latent class analysis by assuming a very general graph structure. However, The flexible topology of GLC leads to large increase of the learning complexity. We first propose the concept of dominance and related theoretical results which is general for all Bayesian networks. Based on dominance, we propose an efficient learning algorithm for GLC. A core technique to prune dominated models is regularization, which can eliminate dominated models, leading to significant pruning of the search space. Significant improvements on the model.