The Effect of Cluster Size, Dimensionality, and Number of Clusters on Recovery of True Cluster Structure Through Chernoff-Type Faces

Author(s):  
P. C. Saxena ◽  
K. Navaneetham
2004 ◽  
Vol 82 (4) ◽  
pp. 323-329
Author(s):  
A Ulug ◽  
M Karakaplan ◽  
B Ulug

Clustering in some two- and three-dimensional lattices is investigated using an algorithm similar to that of Hoshen–Kopelman. The total number of clusters reveals a maximum at an occupation probability, pmax, where the average cluster size, 2.03 ± 0.07, is found to be independent of the size, dimension, coordination number, and the type of lattice. We discussed the fact that the clustering effectively begins at pmax. The percolation threshold, pc, and pmax are found to get closer to each other as the coordination number increases. PACS Nos.: 64.60.Ht, 64.60.Qb, 82.30.Nr


Methodology ◽  
2012 ◽  
Vol 8 (4) ◽  
pp. 146-158 ◽  
Author(s):  
Mirjam Moerbeek

With cluster randomized trials complete groups of subjects are randomized to treatment conditions. An important question might be whether and when the subjects experience a particular event, such as smoking initiation or recovery from disease. In the social sciences the timing of such events is often measured in discrete time by using time intervals. At the planning phase of a cluster randomized trial one should decide on the number of clusters and cluster size such that parameters are estimated accurately and sufficient power on the test on treatment effect is achieved. On basis of a simulation study it is concluded that regression coefficients are estimated more accurately than the variance of the random cluster effect. In addition, it is shown that power increases with cluster size and number of clusters, and that a sufficient power cannot always be achieved by using larger cluster sizes at a fixed number of clusters.


2017 ◽  
Vol 42 (2) ◽  
pp. 136-154 ◽  
Author(s):  
Woo-yeol Lee ◽  
Sun-Joo Cho ◽  
Sonya K. Sterba

The current study investigated the consequences of ignoring a multilevel structure for a mixture item response model to show when a multilevel mixture item response model is needed. Study 1 focused on examining the consequence of ignoring dependency for within-level latent classes. Simulation conditions that may affect model selection and parameter recovery in the context of a multilevel data structure were manipulated: class-specific ICC, cluster size, and number of clusters. The accuracy of model selection (based on information criteria) and quality of parameter recovery were used to evaluate the impact of ignoring a multilevel structure. Simulation results indicated that, for the range of class-specific ICCs examined here (.1 to .3), mixture item response models which ignored a higher level nesting structure resulted in less accurate estimates and standard errors ( SEs) of item discrimination parameters when the number of clusters was larger than 24 and the cluster size was larger than six. Class-varying ICCs can have compensatory effects on bias. Also, the results suggested that a mixture item response model which ignored multilevel structure was not selected over the multilevel mixture item response model based on Bayesian information criterion (BIC) if the number of clusters and cluster size was at least 50, respectively. In Study 2, the consequences of unnecessarily fitting a multilevel mixture item response model to single-level data were examined. Reassuringly, in the context of single-level data, a multilevel mixture item response model was not selected by BIC, and its use would not distort the within-level item parameter estimates or SEs when the cluster size was at least 20. Based on these findings, it is concluded that, for class-specific ICC conditions examined here, a multilevel mixture item response model is recommended over a single-level item response model for a clustered dataset having cluster size [Formula: see text] and the number of clusters [Formula: see text].


2003 ◽  
Vol 17 (01n02) ◽  
pp. 209-212
Author(s):  
J. G. CAO ◽  
L. LU ◽  
L. W. ZHOU

With the combined application of an electric field and shear field, the particles in an ER fluids form cluster structure in the direction of shear. The morphology of the cluster structure was described, and it is found in the experiments that the mean cluster size, S mean, is proportional to [Formula: see text] and E0.64.


2019 ◽  
Author(s):  
Joshua Nugent ◽  
Ken Kleinman

Abstract Background: Linear mixed models (LMM) are a common approach to analyzing data from cluster randomized trials (CRTs). Inference on parameters can be performed via Wald tests or likelihood ratio tests (LRT), but both approaches may give incorrect Type I error rates in common finite sample settings. The impact of interactions of cluster size, number of clusters, intraclass correlation coefficient (ICC), and analysis approach on Type I error rates have not been well studied. Reviews of published CRTs find that small sample sizes are not uncommon, so the performance of different inferential approaches in these settings can guide data analysts to the best choices. Methods: Using a random-intercept LMM stucture, we use simulations to study Type I error rates with the LRT and Wald test with different degrees of freedom (DF) choices across different combinations of cluster size, number of clusters, and ICC.Results: Our simulations show that the LRT can be anti-conservative when the ICC is large and the number of clusters is small, with the effect most pronouced when the cluster size is relatively large. Wald tests with the Between-Within DF method or the Satterthwaite DF approximation maintain Type I error control at the stated level, though they are conservative when the number of clusters, the cluster size, and the ICC are small. Conclusions: Depending on the structure of the CRT, analysts should choose a hypothesis testing approach that will maintain the appropriate Type I error rate for their data. Wald tests with the Satterthwaite DF approximation work well in many circumstances, but in other cases the LRT may have Type I error rates closer to the nominal level.


2020 ◽  
Author(s):  
Joshua Nugent ◽  
Ken Kleinman

Abstract Background: Linear mixed models (LMM) are a common approach to analyzing data from cluster randomized trials (CRTs). Inference on parameters can be performed via Wald tests or likelihood ratio tests (LRT), but both approaches may give incorrect Type I error rates in common finite sample settings. The impact of different combinations of cluster size, number of clusters, intraclass correlation coefficient (ICC), and analysis approach on Type I error rates has not been well studied. Reviews of published CRTs nd that small sample sizes are not uncommon, so the performance of different inferential approaches in these settings can guide data analysts to the best choices.Methods: Using a random-intercept LMM stucture, we use simulations to study Type I error rates with the LRT and Wald test with different degrees of freedom (DF) choices across different combinations of cluster size, number of clusters, and ICC.Results: Our simulations show that the LRT can be anti-conservative when the ICC is large and the number of clusters is small, with the effect most pronounced when the cluster size is relatively large. Wald tests with the between-within DF method or the Satterthwaite DF approximation maintain Type I error control at the stated level, though they are conservative when the number of clusters, the cluster size, and the ICC are small.Conclusions: Depending on the structure of the CRT, analysts should choose a hypothesis testing approach that will maintain the appropriate Type I error rate for their data. Wald tests with the Satterthwaite DF approximation work well in many circumstances, but in other cases the LRT may have Type I error rates closer to the nominal level.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Paulina Zydorowicz ◽  
Michał Jankowski ◽  
Katarzyna Dziubalska-Kołaczyk

Abstract The aim of this contribution is to identify the dominant shapes of the Polish word with reference to three criteria: cluster complexity (i.e., cluster size), saturation (the number of clusters in a word), and diversity (in terms of features of consonant description). The dominant word shape is understood as the most frequent or typical skeletal pattern, expressed by means of alternations or groupings of Cs (consonants) and Vs (vowels), e.g., CVCCV etc., or by means of specific features (of place, manner, voice, and the sonorant/obstruent distinction). Our work focuses on 2 aspects of Polish phonotactics: (1) the relation between cluster complexity and saturation of words with clusters, (2) the degrees of diversity in features of place, manner, and voice within clusters. Using corpus data, we have established that only 4.17% of word shapes have no clusters. The dominant word shape for a one-cluster word is CVCCVCV. The most frequent scenario for a word shape is to contain two clusters, of which 67% are a combination of a word initial and a word medial cluster. We have found that: (1) cluster length is inversely proportional to the number of clusters in a word; (2) nearly 73% of word types contain clusters of the same size, e.g., two CCs or two CCCs (Polish words prefer saturation over complexity); (3) MOA is more diversified than POA across clusters and words.


Sign in / Sign up

Export Citation Format

Share Document