Effects of Vertical Scaling Methods on Linear Growth Estimation

2011 ◽  
Vol 36 (1) ◽  
pp. 21-39 ◽  
Author(s):  
Pui-Wa Lei ◽  
Yu Zhao

Vertical scaling is necessary to facilitate comparison of scores from test forms of different difficulty levels. It is widely used to enable the tracking of student growth in academic performance over time. Most previous studies on vertical scaling methods assume relatively long tests and large samples. Little is known about their performance when the sample is small or the test is short, challenges that small testing programs often face. This study examined effects of sample size, test length, and choice of item response theory (IRT) models on the performance of IRT-based scaling methods (concurrent calibration, separate calibration with Stocking–Lord, Haebara, Mean/Mean, and Mean/Sigma transformation) in linear growth estimation when the 2-parameter IRT model was appropriate. Results showed that IRT vertical scales could be used for growth estimation without grossly biasing growth parameter estimates when sample size was not large, as long as the test was not too short (≥20 items), although larger sample sizes would generally increase the stability of the growth parameter estimates. The optimal rate of return in total estimation error reduction as a result of increasing sample size appeared to be around 250. Concurrent calibration produced slightly lower total estimation error than separate calibration in the worst combination of short test length (≤20 items) and small sample size ( n ≤ 100), whereas separate calibration, except in the case of the Mean/Sigma method, produced similar or somewhat lower amounts of total error in other conditions.

2018 ◽  
Vol 43 (7) ◽  
pp. 512-526
Author(s):  
Kyung Yong Kim

When calibrating items using multidimensional item response theory (MIRT) models, item response theory (IRT) calibration programs typically set the probability density of latent variables to a multivariate standard normal distribution to handle three types of indeterminacies: (a) the location of the origin, (b) the unit of measurement along each coordinate axis, and (c) the orientation of the coordinate axes. However, by doing so, item parameter estimates obtained from two independent calibration runs on nonequivalent groups are on two different coordinate systems. To handle this issue and place all the item parameter estimates on a common coordinate system, a process called linking is necessary. Although various linking methods have been introduced and studied for the full MIRT model, little research has been conducted on linking methods for the bifactor model. Thus, the purpose of this study was to provide detailed descriptions of two separate calibration methods and the concurrent calibration method for the bifactor model and to compare the three linking methods through simulation. In general, the concurrent calibration method provided more accurate linking results than the two separate calibration methods, demonstrating better recovery of the item parameters, item characteristic surfaces, and expected score distribution.


2021 ◽  
Vol 43 (5) ◽  
Author(s):  
João Claudio Vilvert ◽  
Sérgio Tonetto de Freitas ◽  
Maria Aparecida Rodrigues Ferreira ◽  
Eleonora Barbosa Santiago da Costa ◽  
Edna Maria Mendes Aroucha

Abstract The objective of this study was to determine the most efficient sample size required to estimate the mean of postharvest quality traits of ‘Palmer’ mangoes harvested in two growing seasons. A total of 50 mangoes were harvested at maturity stage 2, in winter (June 2020) and spring (October 2020), and evaluated for weight, length, ventral and transverse diameter, skin and pulp L*, C* and hº, dry matter, firmness, soluble solids (SS), titratable acidity (TA) and the SS/TA ratio. According to the results, the coefficient of variation (CV) of fruit quality traits ranged from 2.1% to 18.1%. The highest CV in both harvests was observed for the SS/TA ratio, while the lowest was reported for pulp hº. In order to estimate the mean of physicochemical traits of ‘Palmer’ mangoes, 12 fruits are needed in the winter and 14 in the spring, considering an estimation error of 10% and a confidence interval of 95%. TA and the SS/TA ratio required the highest sample size, while L* and hº required the lowest sample size. In conclusion, the variability was different among physicochemical traits and seasons, implying that different sample sizes are required to estimate the mean of different quality traits in different growing seasons.


2017 ◽  
Vol 78 (6) ◽  
pp. 925-951 ◽  
Author(s):  
Unkyung No ◽  
Sehee Hong

The purpose of the present study is to compare performances of mixture modeling approaches (i.e., one-step approach, three-step maximum-likelihood approach, three-step BCH approach, and LTB approach) based on diverse sample size conditions. To carry out this research, two simulation studies were conducted with two different models, a latent class model with three predictor variables and a latent class model with one distal outcome variable. For the simulation, data were generated under the conditions of different sample sizes (100, 200, 300, 500, 1,000), entropy (0.6, 0.7, 0.8, 0.9), and the variance of a distal outcome (homoscedasticity, heteroscedasticity). For evaluation criteria, parameter estimates bias, standard error bias, mean squared error, and coverage were used. Results demonstrate that the three-step approaches produced more stable and better estimations than the other approaches even with a small sample size of 100. This research differs from previous studies in the sense that various models were used to compare the approaches and smaller sample size conditions were used. Furthermore, the results supporting the superiority of the three-step approaches even in poorly manipulated conditions indicate the advantage of these approaches.


Author(s):  
B. He ◽  
M. Xie ◽  
T. N. Goh ◽  
P. Ranjan

The control chart based on a Poisson distribution has often been used to monitor the number of defects in sampling units. However, many false alarms could be observed due to extra zero counts, especially for high-quality processes. Therefore, some alternatives have been developed to alleviate this problem, one of which is the control chart based on the zero-inflated Poisson distribution. This distribution takes into account the extra zeros present in the data, and yield more accurate results than the Poisson distribution. However, implementing a control chart is often based on the assumption that the parameters are either known or an accurate estimate is available. For a high quality process, an accurate estimate may require a very large sample size, which is seldom available. In this paper the effect of estimation error is investigated. An analytical approximation is derived to compute shift detection probability and run length distribution. The study shows that the false alarm rates are higher than the desirable level for smaller values of the sample size. This is further supported by smaller average run length. In general, the quantitative results from this paper can be utilized to select a minimum size of the initial sample for estimating the control limits so that certain average run length requirements are met.


2004 ◽  
Vol 174 (1-2) ◽  
pp. 115-126 ◽  
Author(s):  
J.J Colbert ◽  
Michael Schuckers ◽  
Desta Fekedulegn ◽  
James Rentch ◽  
Máirtı́n MacSiúrtáin ◽  
...  

1969 ◽  
Vol 39 (3) ◽  
pp. 484-510 ◽  
Author(s):  
Richard Light ◽  
Paul Smith

The authors consider Professor Jensen's hypothesis that inherited factors may be"implicated" in observed racial differences in measured intelligence. They argue that even if one chooses to accept Professor Jensen's estimates of the proportion of variance in intelligence accounted for by heredity, environment, and their interaction,his hypothesis is not substantiated by his own data. They go on to say that the parameter estimates are highly suspect, given the small sample size of the twin studies and the way disparate studies were combined. The authors simulated, on a computer, the process of studying twins and found that the statistical procedures employed in these studies of intelligence yield quite unstable estimates. In particular,the estimate of the interaction effect is quite unreliable, both because of sample size, and because Jensen chose a statistical model which would attribute some interaction to the main variables—heredity and environment. Finally, the authors propose that the studies of intelligence reported by Professor Jensen ignore the reality of feedback loops, initiated by physical differences, and enhanced by processes of social differentiation in our society.


2016 ◽  
Vol 28 (4) ◽  
pp. 455-466
Author(s):  
Zahra Sedighi Maman ◽  
W. Wade Murphy ◽  
Saeed Maghsoodloo ◽  
Fatemah Haji Ahmadi ◽  
Fadel M. Megahed

2014 ◽  
Vol 38 (5) ◽  
pp. 423-434 ◽  
Author(s):  
Mijke Rhemtulla ◽  
Fan Jia ◽  
Wei Wu ◽  
Todd D. Little

We examine the performance of planned missing (PM) designs for correlated latent growth curve models. Using simulated data from a model where latent growth curves are fitted to two constructs over five time points, we apply three kinds of planned missingness. The first is item-level planned missingness using a three-form design at each wave such that 25% of data are missing. The second is wave-level planned missingness such that each participant is missing up to two waves of data. The third combines both forms of missingness. We find that three-form missingness results in high convergence rates, little parameter estimate or standard error bias, and high efficiency relative to the complete data design for almost all parameter types. In contrast, wave missingness and the combined design result in dramatically lowered efficiency for parameters measuring individual variability in rates of change (e.g., latent slope variances and covariances), and bias in both estimates and standard errors for these same parameters. We conclude that wave missingness should not be used except with large effect sizes and very large samples.


Sign in / Sign up

Export Citation Format

Share Document