scholarly journals On the Shapes of the Polish Word: Phonotactic Complexity and Diversity

2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Paulina Zydorowicz ◽  
Michał Jankowski ◽  
Katarzyna Dziubalska-Kołaczyk

Abstract The aim of this contribution is to identify the dominant shapes of the Polish word with reference to three criteria: cluster complexity (i.e., cluster size), saturation (the number of clusters in a word), and diversity (in terms of features of consonant description). The dominant word shape is understood as the most frequent or typical skeletal pattern, expressed by means of alternations or groupings of Cs (consonants) and Vs (vowels), e.g., CVCCV etc., or by means of specific features (of place, manner, voice, and the sonorant/obstruent distinction). Our work focuses on 2 aspects of Polish phonotactics: (1) the relation between cluster complexity and saturation of words with clusters, (2) the degrees of diversity in features of place, manner, and voice within clusters. Using corpus data, we have established that only 4.17% of word shapes have no clusters. The dominant word shape for a one-cluster word is CVCCVCV. The most frequent scenario for a word shape is to contain two clusters, of which 67% are a combination of a word initial and a word medial cluster. We have found that: (1) cluster length is inversely proportional to the number of clusters in a word; (2) nearly 73% of word types contain clusters of the same size, e.g., two CCs or two CCCs (Polish words prefer saturation over complexity); (3) MOA is more diversified than POA across clusters and words.

2004 ◽  
Vol 82 (4) ◽  
pp. 323-329
Author(s):  
A Ulug ◽  
M Karakaplan ◽  
B Ulug

Clustering in some two- and three-dimensional lattices is investigated using an algorithm similar to that of Hoshen–Kopelman. The total number of clusters reveals a maximum at an occupation probability, pmax, where the average cluster size, 2.03 ± 0.07, is found to be independent of the size, dimension, coordination number, and the type of lattice. We discussed the fact that the clustering effectively begins at pmax. The percolation threshold, pc, and pmax are found to get closer to each other as the coordination number increases. PACS Nos.: 64.60.Ht, 64.60.Qb, 82.30.Nr


Methodology ◽  
2012 ◽  
Vol 8 (4) ◽  
pp. 146-158 ◽  
Author(s):  
Mirjam Moerbeek

With cluster randomized trials complete groups of subjects are randomized to treatment conditions. An important question might be whether and when the subjects experience a particular event, such as smoking initiation or recovery from disease. In the social sciences the timing of such events is often measured in discrete time by using time intervals. At the planning phase of a cluster randomized trial one should decide on the number of clusters and cluster size such that parameters are estimated accurately and sufficient power on the test on treatment effect is achieved. On basis of a simulation study it is concluded that regression coefficients are estimated more accurately than the variance of the random cluster effect. In addition, it is shown that power increases with cluster size and number of clusters, and that a sufficient power cannot always be achieved by using larger cluster sizes at a fixed number of clusters.


2017 ◽  
Vol 42 (2) ◽  
pp. 136-154 ◽  
Author(s):  
Woo-yeol Lee ◽  
Sun-Joo Cho ◽  
Sonya K. Sterba

The current study investigated the consequences of ignoring a multilevel structure for a mixture item response model to show when a multilevel mixture item response model is needed. Study 1 focused on examining the consequence of ignoring dependency for within-level latent classes. Simulation conditions that may affect model selection and parameter recovery in the context of a multilevel data structure were manipulated: class-specific ICC, cluster size, and number of clusters. The accuracy of model selection (based on information criteria) and quality of parameter recovery were used to evaluate the impact of ignoring a multilevel structure. Simulation results indicated that, for the range of class-specific ICCs examined here (.1 to .3), mixture item response models which ignored a higher level nesting structure resulted in less accurate estimates and standard errors ( SEs) of item discrimination parameters when the number of clusters was larger than 24 and the cluster size was larger than six. Class-varying ICCs can have compensatory effects on bias. Also, the results suggested that a mixture item response model which ignored multilevel structure was not selected over the multilevel mixture item response model based on Bayesian information criterion (BIC) if the number of clusters and cluster size was at least 50, respectively. In Study 2, the consequences of unnecessarily fitting a multilevel mixture item response model to single-level data were examined. Reassuringly, in the context of single-level data, a multilevel mixture item response model was not selected by BIC, and its use would not distort the within-level item parameter estimates or SEs when the cluster size was at least 20. Based on these findings, it is concluded that, for class-specific ICC conditions examined here, a multilevel mixture item response model is recommended over a single-level item response model for a clustered dataset having cluster size [Formula: see text] and the number of clusters [Formula: see text].


2019 ◽  
Author(s):  
Joshua Nugent ◽  
Ken Kleinman

Abstract Background: Linear mixed models (LMM) are a common approach to analyzing data from cluster randomized trials (CRTs). Inference on parameters can be performed via Wald tests or likelihood ratio tests (LRT), but both approaches may give incorrect Type I error rates in common finite sample settings. The impact of interactions of cluster size, number of clusters, intraclass correlation coefficient (ICC), and analysis approach on Type I error rates have not been well studied. Reviews of published CRTs find that small sample sizes are not uncommon, so the performance of different inferential approaches in these settings can guide data analysts to the best choices. Methods: Using a random-intercept LMM stucture, we use simulations to study Type I error rates with the LRT and Wald test with different degrees of freedom (DF) choices across different combinations of cluster size, number of clusters, and ICC.Results: Our simulations show that the LRT can be anti-conservative when the ICC is large and the number of clusters is small, with the effect most pronouced when the cluster size is relatively large. Wald tests with the Between-Within DF method or the Satterthwaite DF approximation maintain Type I error control at the stated level, though they are conservative when the number of clusters, the cluster size, and the ICC are small. Conclusions: Depending on the structure of the CRT, analysts should choose a hypothesis testing approach that will maintain the appropriate Type I error rate for their data. Wald tests with the Satterthwaite DF approximation work well in many circumstances, but in other cases the LRT may have Type I error rates closer to the nominal level.


2020 ◽  
Author(s):  
Joshua Nugent ◽  
Ken Kleinman

Abstract Background: Linear mixed models (LMM) are a common approach to analyzing data from cluster randomized trials (CRTs). Inference on parameters can be performed via Wald tests or likelihood ratio tests (LRT), but both approaches may give incorrect Type I error rates in common finite sample settings. The impact of different combinations of cluster size, number of clusters, intraclass correlation coefficient (ICC), and analysis approach on Type I error rates has not been well studied. Reviews of published CRTs nd that small sample sizes are not uncommon, so the performance of different inferential approaches in these settings can guide data analysts to the best choices.Methods: Using a random-intercept LMM stucture, we use simulations to study Type I error rates with the LRT and Wald test with different degrees of freedom (DF) choices across different combinations of cluster size, number of clusters, and ICC.Results: Our simulations show that the LRT can be anti-conservative when the ICC is large and the number of clusters is small, with the effect most pronounced when the cluster size is relatively large. Wald tests with the between-within DF method or the Satterthwaite DF approximation maintain Type I error control at the stated level, though they are conservative when the number of clusters, the cluster size, and the ICC are small.Conclusions: Depending on the structure of the CRT, analysts should choose a hypothesis testing approach that will maintain the appropriate Type I error rate for their data. Wald tests with the Satterthwaite DF approximation work well in many circumstances, but in other cases the LRT may have Type I error rates closer to the nominal level.


HortScience ◽  
1998 ◽  
Vol 33 (4) ◽  
pp. 592b-592
Author(s):  
K.R. Woodburn ◽  
J.R. Clark

Flower cluster thinning effects were investigated on A-2274, a large-fruited, seedless table grape selection from the Univ. of Arkansas Grape Breeding Program. The objective of the study was to evaluate flower cluster thinning as a method to enhance cluster size and fill. Treatments included thinning to one flower cluster per shoot, removing one-half of each cluster, and a control (no flowers removed). Each treatment consisted of three, single-vine replications, with each vine being pruned to 32 buds. Removal of entire flower clusters (to one per shoot) resulted in larger clusters and a trend toward higher cluster fill ratings. Berry mass, number of clusters per vine, and yield per vine were unaffected by flower cluster treatment. Berries per cluster were reduced by the partial flower cluster removal treatment. Flower cluster thinning to one cluster proved a beneficial practice in increasing cluster characteristics of this promising selection.


2020 ◽  
Vol 48 (1) ◽  
Author(s):  
Noboru Minakawa ◽  
James O. Kongere ◽  
George O. Sonye ◽  
Peter A. Lutiali ◽  
Beatrice Awuor ◽  
...  

Abstract Background Although long-lasting insecticidal nets (LLINs) are the most effective tool for preventing malaria parasite transmission, the nets have some limitations. For example, the increase of LLIN use has induced the rapid expansion of mosquito insecticide resistance. More than two persons often share one net, which increases the infection risk. To overcome these problems, two new mosquito nets were developed, one incorporating piperonyl butoxide and another covering ceilings and open eaves. We designed a cluster randomized controlled trial (cRCT) to evaluate these nets based on the information provided in the present preliminary study. Results Nearly 75% of the anopheline population in the study area in western Kenya was Anopheles gambiae s. l., and the remaining was Anopheles funestus s. l. More female anophelines were recorded in the western part of the study area. The number of anophelines increased with rainfall. We planned to have 80% power to detect a 50% reduction in female anophelines between the control group and each intervention group. The between-cluster coefficient of variance was 0.192. As the number of clusters was limited to 4 due to the size of the study area, the estimated cluster size was 7 spray catches with an alpha of 0.05. Of 1619 children tested, 626 (48%) were Plasmodium falciparum positive using a rapid diagnostic test (RDT). The prevalence was higher in the northwestern part of the study area. The number of children who slept under bed nets was 929 (71%). The P. falciparum RDT-positive prevalence (RDTpfPR) of net users was 45%, and that of non-users was 55% (OR 0.73; 95% CI 0.56, 0.95). Using 45% RDTpfPR of net users, we expected each intervention to reduce prevalence by 50%. The intracluster correlation coefficient was 0.053. With 80% power and an alpha of 0.05, the estimated cluster size was 116 children. Based on the distribution of children, we modified the boundaries of the clusters and established 300-m buffer zones along the boundaries to minimize a spillover effect. Conclusions The cRCT study design is feasible. As the number of clusters is limited, we will apply a two-stage procedure with the baseline data to evaluate each intervention.


Sign in / Sign up

Export Citation Format

Share Document