scholarly journals The comparison of the scores obtained by Bayesian nonparametric model and classical test theory methods

2021 ◽  
Vol 104 (3) ◽  
pp. 003685042110283
Author(s):  
Meltem Yurtcu ◽  
Hülya Kelecioglu ◽  
Edward L Boone

Bayesian Nonparametric (BNP) modelling can be used to obtain more detailed information in test equating studies and to increase the accuracy of equating by accounting for covariates. In this study, two covariates are included in the equating under the Bayes nonparametric model, one is continuous, and the other is discrete. Scores equated with this model were obtained for a single group design for a small group in the study. The equated scores obtained with the model were compared with the mean and linear equating methods in the Classical Test Theory. Considering the equated scores obtained from three different methods, it was found that the equated scores obtained with the BNP model produced a distribution closer to the target test. Even the classical methods will give a good result with the smallest error when using a small sample, making equating studies valuable. The inclusion of the covariates in the model in the classical test equating process is based on some assumptions and cannot be achieved especially using small groups. The BNP model will be more beneficial than using frequentist methods, regardless of this limitation. Information about booklets and variables can be obtained from the distributors and equated scores that obtained with the BNP model. In this case, it makes it possible to compare sub-categories. This can be expressed as indicating the presence of differential item functioning (DIF). Therefore, the BNP model can be used actively in test equating studies, and it provides an opportunity to examine the characteristics of the individual participants at the same time. Thus, it allows test equating even in a small sample and offers the opportunity to reach a value closer to the scores in the target test.

Author(s):  
Ansgar Opitz ◽  
Moritz Heene ◽  
Frank Fischer

Abstract. A significant problem that assessments of scientific reasoning face at the level of higher education is the question of domain generality, that is, whether a test will produce biased results for students from different domains. This study applied three recently developed methods of analyzing differential item functioning (DIF) to evaluate the domain generality assumption of a common scientific reasoning test. Additionally, we evaluated the usefulness of these new, tree- and lasso-based, methods to analyze DIF and compared them with methods based on classical test theory. We gave the scientific reasoning test to 507 university students majoring in physics, biology, or medicine. All three DIF analysis methods indicated a domain bias present in about one-third of the items, mostly benefiting biology students. We did not find this bias by using methods based on classical test theory. Those methods indicated instead that all items were easier for physics students compared to biology students. Thus, the tree- and lasso-based methods provide a clear added value to test evaluation. Taken together, our analyses indicate that the scientific reasoning test is neither entirely domain-general, nor entirely domain-specific. We advise against using it in high-stakes situations involving domain comparisons.


2019 ◽  
Vol 29 (4) ◽  
pp. 962-986
Author(s):  
R Gorter ◽  
J-P Fox ◽  
G Ter Riet ◽  
MW Heymans ◽  
JWR Twisk

Latent growth models are often used to measure individual trajectories representing change over time. The characteristics of the individual trajectories depend on the variability in the longitudinal outcomes. In many medical and epidemiological studies, the individual health outcomes cannot be observed directly and are indirectly observed through indicators (i.e. items of a questionnaire). An item response theory or a classical test theory measurement model is required, but the choice can influence the latent growth estimates. In this study, under various conditions, this influence is directly assessed by estimating latent growth parameters on a common scale for item response theory and classical test theory using a novel plausible value method in combination with Markov chain Monte Carlo. The latent outcomes are considered missing data and plausible values are generated from the corresponding posterior distribution, separately for item response theory and classical test theory. These plausible values are linearly transformed to a common scale. A Markov chain Monte Carlo method was developed to simultaneously estimate the latent growth and measurement model parameters using this plausible value technique. It is shown that estimated individual trajectories using item response theory, compared to classical test theory to measure outcomes, provide a more detailed description of individual change over time, since item response patterns (item response theory) are more informative about the health measurements than sum scores (classical test theory).


Sign in / Sign up

Export Citation Format

Share Document