FCM-Based Model Selection Algorithms for Determining the Number of Clusters

2004 ◽  
Vol 37 (10) ◽  
pp. 2027-2037 ◽  
Author(s):  
Haojun Sun ◽  
Shengrui Wang ◽  
Qingshan Jiang
2017 ◽  
Vol 42 (2) ◽  
pp. 136-154 ◽  
Author(s):  
Woo-yeol Lee ◽  
Sun-Joo Cho ◽  
Sonya K. Sterba

The current study investigated the consequences of ignoring a multilevel structure for a mixture item response model to show when a multilevel mixture item response model is needed. Study 1 focused on examining the consequence of ignoring dependency for within-level latent classes. Simulation conditions that may affect model selection and parameter recovery in the context of a multilevel data structure were manipulated: class-specific ICC, cluster size, and number of clusters. The accuracy of model selection (based on information criteria) and quality of parameter recovery were used to evaluate the impact of ignoring a multilevel structure. Simulation results indicated that, for the range of class-specific ICCs examined here (.1 to .3), mixture item response models which ignored a higher level nesting structure resulted in less accurate estimates and standard errors ( SEs) of item discrimination parameters when the number of clusters was larger than 24 and the cluster size was larger than six. Class-varying ICCs can have compensatory effects on bias. Also, the results suggested that a mixture item response model which ignored multilevel structure was not selected over the multilevel mixture item response model based on Bayesian information criterion (BIC) if the number of clusters and cluster size was at least 50, respectively. In Study 2, the consequences of unnecessarily fitting a multilevel mixture item response model to single-level data were examined. Reassuringly, in the context of single-level data, a multilevel mixture item response model was not selected by BIC, and its use would not distort the within-level item parameter estimates or SEs when the cluster size was at least 20. Based on these findings, it is concluded that, for class-specific ICC conditions examined here, a multilevel mixture item response model is recommended over a single-level item response model for a clustered dataset having cluster size [Formula: see text] and the number of clusters [Formula: see text].


Methodology ◽  
2021 ◽  
Vol 17 (2) ◽  
pp. 127-148
Author(s):  
Mikkel N. Schmidt ◽  
Daniel Seddig ◽  
Eldad Davidov ◽  
Morten Mørup ◽  
Kristoffer Jon Albers ◽  
...  

Latent Profile Analysis (LPA) is a method to extract homogeneous clusters characterized by a common response profile. Previous works employing LPA to human value segmentation tend to select a small number of moderately homogeneous clusters based on model selection criteria such as Akaike information criterion, Bayesian information criterion and Entropy. The question is whether a small number of clusters is all that can be gleaned from the data. While some studies have carefully compared different statistical model selection criteria, there is currently no established criteria to assess if an increased number of clusters generates meaningful theoretical insights. This article examines the content and meaningfulness of the clusters extracted using two algorithms: Variational Bayesian LPA and Maximum Likelihood LPA. For both methods, our results point towards eight as the optimal number of clusters for characterizing distinctive Schwartz value typologies that generate meaningful insights and predict several external variables.


Author(s):  
JINWEN MA ◽  
TAIJUN WANG

Gaussian mixture modeling is a powerful approach for data analysis and the determination of the number of Gaussians, or clusters, is actually the problem of Gaussian mixture model selection which has been investigated from several respects. This paper proposes a new kind of automated model selection algorithm for Gaussian mixture modeling via an entropy penalized maximum-likelihood estimation. It is demonstrated by the experiments that the proposed algorithm can make model selection automatically during the parameter estimation, with the mixing proportions of the extra Gaussians attenuating to zero. As compared with the BYY automated model selection algorithms, it converges more stably and accurately as the number of samples becomes large.


2011 ◽  
Vol 27 (2) ◽  
pp. 269-296 ◽  
Author(s):  
Jennifer L. Castle ◽  
Xiaochuan Qin ◽  
W. Robert Reed

2011 ◽  
Vol 2 (2) ◽  
Author(s):  
Satkartar K. Kinney ◽  
Jerome P. Reiter ◽  
James O. Berger

Several statistical agencies use, or are considering the use of, multiple imputation to limit the risk of disclosing respondents' identities or sensitive attributes in public use files. For example, agencies can release partially synthetic datasets, comprising the units originally surveyed with some values, such as sensitive values at high risk of disclosure, or values of key identifiers, replaced with multiple imputations. We describe how secondary analysts of such multiply-imputed datasets can implement Bayesian model selection procedures that appropriately condition on the multiple datasets and the information released by the agency about the imputation models. We illustrate by deriving Bayes factor approximations and a data augmentation step for stochastic search variable selection algorithms.


2016 ◽  
Author(s):  
Andrea Rau ◽  
Cathy Maugis-Rabusseau

AbstractAlthough a large number of clustering algorithms have been proposed to identify groups of co-expressed genes from microarray data, the question of if and how such methods may be applied to RNA-seq data remains unaddressed. In this work, we investigate the use of data transformations in conjunction with Gaussian mixture models for RNA-seq co-expression analyses, as well as a penalized model selection criterion to select both an appropriate transformation and number of clusters present in the data. This approach has the advantage of accounting for per-cluster correlation structures among samples, which can be quite strong in RNA-seq data. In addition, it provides a rigorous statistical framework for parameter estimation, an objective assessment of data transformations and number of clusters, and the possibility of performing diagnostic checks on the quality and homogeneity of the identified clusters. We analyze four varied RNA-seq datasets to illustrate the use of transformations and model selection in conjunction with Gaussian mixture models. Finally, we propose an R package coseq (co-expression of RNA-seq data) to facilitate implementation and visualization of the recommended RNA-seq co-expression analyses.


Sign in / Sign up

Export Citation Format

Share Document