scholarly journals Improving Accuracy and Usage by Correctly Selecting: The Effects of Model Selection in Cognitive Diagnosis Computerized Adaptive Testing

2020 ◽  
pp. 014662162097768
Author(s):  
Miguel A. Sorrel ◽  
Francisco José Abad ◽  
Pablo Nájera

Decisions on how to calibrate an item bank might have major implications in the subsequent performance of the adaptive algorithms. One of these decisions is model selection, which can become problematic in the context of cognitive diagnosis computerized adaptive testing, given the wide range of models available. This article aims to determine whether model selection indices can be used to improve the performance of adaptive tests. Three factors were considered in a simulation study, that is, calibration sample size, Q-matrix complexity, and item bank length. Results based on the true item parameters, and general and single reduced model estimates were compared to those of the combination of appropriate models. The results indicate that fitting a single reduced model or a general model will not generally provide optimal results. Results based on the combination of models selected by the fit index were always closer to those obtained with the true item parameters. The implications for practical settings include an improvement in terms of classification accuracy and, consequently, testing time, and a more balanced use of the item bank. An R package was developed, named cdcatR, to facilitate adaptive applications in this context.

1999 ◽  
Vol 15 (2) ◽  
pp. 91-98 ◽  
Author(s):  
Lutz F. Hornke

Summary: Item parameters for several hundreds of items were estimated based on empirical data from several thousands of subjects. The logistic one-parameter (1PL) and two-parameter (2PL) model estimates were evaluated. However, model fit showed that only a subset of items complied sufficiently, so that the remaining ones were assembled in well-fitting item banks. In several simulation studies 5000 simulated responses were generated in accordance with a computerized adaptive test procedure along with person parameters. A general reliability of .80 or a standard error of measurement of .44 was used as a stopping rule to end CAT testing. We also recorded how often each item was used by all simulees. Person-parameter estimates based on CAT correlated higher than .90 with true values simulated. For all 1PL fitting item banks most simulees used more than 20 items but less than 30 items to reach the pre-set level of measurement error. However, testing based on item banks that complied to the 2PL revealed that, on average, only 10 items were sufficient to end testing at the same measurement error level. Both clearly demonstrate the precision and economy of computerized adaptive testing. Empirical evaluations from everyday uses will show whether these trends will hold up in practice. If so, CAT will become possible and reasonable with some 150 well-calibrated 2PL items.


2020 ◽  
Author(s):  
Menghua She ◽  
Yaling Li ◽  
Dongbo Tu ◽  
Yan Cai

Abstract Background: As more and more people suffer from sleep disorders, developing an efficient, cheap and accurate assessment tool for screening sleep disorders is becoming more urgent. This study developed a computerized adaptive testing for sleep disorders (CAT-SD). Methods: A large sample of 1,304 participants was recruited to construct the item pool of CAT-SD and to investigate the psychometric characteristics of CAT-SD. More specifically, firstly the analyses of unidimensionality, model fit, item fit, item discrimination parameter and differential item functioning (DIF) were conducted to construct a final item pool which meets the requirements of item response theory (IRT) measurement. In addition, a simulated CAT study with real response data of participants was performed to investigate the psychometric characteristics of CAT-SD, including reliability, validity and predictive utility (sensitivity and specificity). Results: The final unidimensional item bank of the CAT-SD not only had good item fit, high discrimination and no DIF; Moreover, it had acceptable reliability, validity and predictive utility. Conclusions: The CAT-SD could be used as an effective and accurate assessment tool for measuring individuals' severity of the sleep disorders and offers a bran-new perspective for screening of sleep disorders with psychological scales.


Author(s):  
Louise C. Mâsse ◽  
Teresia M. O’Connor ◽  
Yingyi Lin ◽  
Sheryl O. Hughes ◽  
Claire N. Tugault-Lafleur ◽  
...  

Abstract Purpose There has been a call to improve measurement rigour and standardization of food parenting practices measures, as well as aligning the measurement of food parenting practices with the parenting literature. Drawing from an expert-informed conceptual framework assessing three key domains of food parenting practices (autonomy promotion, control, and structure), this study combined factor analytic methods with Item Response Modeling (IRM) methodology to psychometrically validate responses to the Food Parenting Practice item bank. Methods A sample of 799 Canadian parents of 5–12-year-old children completed the Food Parenting Practice item bank (129 items measuring 17 constructs). The factorial structure of the responses to the item bank was assessed with confirmatory factor analysis (CFA), confirmatory bi-factor item analysis, and IRM. Following these analyses, differential Item Functioning (DIF) and Differential Response Functioning (DRF) analyses were then used to test invariance properties by parents’ sex, income and ethnicity. Finally, the efficiency of the item bank was examined using computerized adaptive testing simulations to identify the items to include in a short form. Results Overall, the expert-informed conceptual framework was predominantly supported by the CFA as it retained the same 17 constructs included in the conceptual framework with the exception of the access/availability and permissive constructs which were respectively renamed covert control and accommodating the child to better reflect the content of the final solution. The bi-factor item analyses and IRM analyses revealed that the solution could be simplified to 11 unidimensional constructs and the full item bank included 86-items (empirical reliability from 0.78 to 0.96, except for 1 construct) and the short form had 48 items. Conclusion Overall the food parenting practice item bank has excellent psychometric properties. The item bank includes an expanded version and short version to meet various study needs. This study provides more efficient tools for assessing how food parenting practices influence child dietary behaviours. Next steps are to use the IRM calibrated item bank and draw on computerized adaptive testing methodology to administer the item bank and provide flexibility in item selection.


2020 ◽  
Vol 11 ◽  
Author(s):  
Xiaojian Sun ◽  
Yanlou Liu ◽  
Tao Xin ◽  
Naiqing Song

Calibration errors are inevitable and should not be ignored during the estimation of item parameters. Items with calibration error can affect the measurement results of tests. One of the purposes of the current study is to investigate the impacts of the calibration errors during the estimation of item parameters on the measurement accuracy, average test length, and test efficiency for variable-length cognitive diagnostic computerized adaptive testing. The other purpose is to examine the methods for reducing the adverse effects of calibration errors. Simulation results show that (1) calibration error has negative effect on the measurement accuracy for the deterministic input, noisy “and” gate (DINA) model, and the reduced reparameterized unified model; (2) the average test lengths is shorter, and the test efficiency is overestimated for items with calibration errors; (3) the compensatory reparameterized unified model (CRUM) is less affected by the calibration errors, and the classification accuracy, average test length, and test efficiency are slightly stable in the CRUM framework; (4) methods such as improving the quality of items, using large calibration sample to calibrate the parameters of items, as well as using cross-validation method can reduce the adverse effects of calibration errors on CD-CAT.


2021 ◽  
Vol 12 ◽  
Author(s):  
Wenyi Wang ◽  
Yukun Tu ◽  
Lihong Song ◽  
Juanjuan Zheng ◽  
Teng Wang

The implementation of cognitive diagnostic computerized adaptive testing often depends on a high-quality item bank. How to online estimate the item parameters and calibrate the Q-matrix required by items becomes an important problem in the construction of the high-quality item bank for personalized adaptive learning. The related previous research mainly focused on the calibration method with the random design in which the new items were randomly assigned to examinees. Although the way of randomly assigning new items can ensure the randomness of data sampling, some examinees cannot provide enough information about item parameter estimation or Q-matrix calibration for the new items. In order to increase design efficiency, we investigated three adaptive designs under different practical situations: (a) because the non-parametric classification method needs calibrated item attribute vectors, but not item parameters, the first study focused on an optimal design for the calibration of the Q-matrix of the new items based on Shannon entropy; (b) if the Q-matrix of the new items was specified by subject experts, an optimal design was designed for the estimation of item parameters based on Fisher information; and (c) if the Q-matrix and item parameters are unknown for the new items, we developed a hybrid optimal design for simultaneously estimating them. The simulation results showed that, the adaptive designs are better than the random design with a limited number of examinees in terms of the correct recovery rate of attribute vectors and the precision of item parameters.


SAGE Open ◽  
2020 ◽  
Vol 10 (1) ◽  
pp. 215824401989904
Author(s):  
Wenyi Wang ◽  
Lihong Song ◽  
Teng Wang ◽  
Peng Gao ◽  
Jian Xiong

The purpose of this study is to investigate the relationship between the Shannon entropy procedure and the Jensen–Shannon divergence (JSD) that are used as item selection criteria in cognitive diagnostic computerized adaptive testing (CD-CAT). Because the JSD itself is defined by the Shannon entropy, we apply the well-known relationship between the JSD and Shannon entropy to establish a relationship between the item selection criteria that are based on these two measures. To understand the relationship between these two item selection criteria better, an alternative way is also provided. Theoretical derivations and empirical examples have shown that the Shannon entropy procedure and the JSD in CD-CAT have a linear relation under cognitive diagnostic models. Consistent with our theoretical conclusions, simulation results have shown that two item selection criteria behaved quite similarly in terms of attribute-level and pattern recovery rates under all conditions and they selected the same set of items for each examinee from an item bank with item parameters drawn from a uniform distribution U(0.1, 0.3) under post hoc simulations. We provide some suggestions for future studies and a discussion of relationship between the modified posterior-weighted Kullback–Leibler index and the G-DINA (generalized deterministic inputs, noisy “and” gate) discrimination index.


2008 ◽  
Vol 216 (1) ◽  
pp. 12-21 ◽  
Author(s):  
A. Michiel Hol ◽  
Harrie C.M. Vorst ◽  
Gideon J. Mellenbergh

A computerized adaptive testing (CAT) procedure was simulated with ordinal polytomous personality data collected using a conventional paper-and-pencil testing format. An adapted Dutch version of the dominance scale of Gough and Heilbrun’s Adjective Check List (ACL) was used. This version contained Likert response scales with five categories. Item parameters were estimated using Samejima’s graded response model from the responses of 1,925 subjects. The CAT procedure was simulated using the responses of 1,517 other subjects. The value of the required standard error in the stopping rule of the CAT was manipulated. The relationship between CAT latent trait estimates and estimates based on all dominance items was studied. Additionally, the pattern of relationships between the CAT latent trait estimates and the other ACL scales was compared to that between latent trait estimates based on the entire item pool and the other ACL scales. The CAT procedure resulted in latent trait estimates qualitatively equivalent to latent trait estimates based on all items, while a substantial reduction of the number of used items could be realized (at the stopping rule of 0.4 about 33% of the 36 items was used).


2013 ◽  
Vol 20 (4) ◽  
pp. 616-626
Author(s):  
Xiao-Juan TANG ◽  
Shu-Liang DING ◽  
Zong-Huo YU

Sign in / Sign up

Export Citation Format

Share Document