Computerized Adaptive Testing of Personality Traits

2008 ◽  
Vol 216 (1) ◽  
pp. 12-21 ◽  
Author(s):  
A. Michiel Hol ◽  
Harrie C.M. Vorst ◽  
Gideon J. Mellenbergh

A computerized adaptive testing (CAT) procedure was simulated with ordinal polytomous personality data collected using a conventional paper-and-pencil testing format. An adapted Dutch version of the dominance scale of Gough and Heilbrun’s Adjective Check List (ACL) was used. This version contained Likert response scales with five categories. Item parameters were estimated using Samejima’s graded response model from the responses of 1,925 subjects. The CAT procedure was simulated using the responses of 1,517 other subjects. The value of the required standard error in the stopping rule of the CAT was manipulated. The relationship between CAT latent trait estimates and estimates based on all dominance items was studied. Additionally, the pattern of relationships between the CAT latent trait estimates and the other ACL scales was compared to that between latent trait estimates based on the entire item pool and the other ACL scales. The CAT procedure resulted in latent trait estimates qualitatively equivalent to latent trait estimates based on all items, while a substantial reduction of the number of used items could be realized (at the stopping rule of 0.4 about 33% of the 36 items was used).

2017 ◽  
Vol 42 (5) ◽  
pp. 327-342 ◽  
Author(s):  
Muirne C. S. Paap ◽  
Karel A. Kroeze ◽  
Cees A. W. Glas ◽  
Caroline B. Terwee ◽  
Job van der Palen ◽  
...  

As there is currently a marked increase in the use of both unidimensional (UCAT) and multidimensional computerized adaptive testing (MCAT) in psychological and health measurement, the main aim of the present study is to assess the incremental value of using MCAT rather than separate UCATs for each dimension. Simulations are based on empirical data that could be considered typical for health measurement: a large number of dimensions (4), strong correlations among dimensions (.77-.87), and polytomously scored response data. Both variable- ( SE < .316, SE < .387) and fixed-length conditions (total test length of 12, 20, or 32 items) are studied. The item parameters and variance–covariance matrix Φ are estimated with the multidimensional graded response model (GRM). Outcome variables include computerized adaptive test (CAT) length, root mean square error (RMSE), and bias. Both simulated and empirical latent trait distributions are used to sample vectors of true scores. MCATs were generally more efficient (in terms of test length) and more accurate (in terms of RMSE) than their UCAT counterparts. Absolute average bias was highest for variable-length UCATs with termination rule SE < .387. Test length of variable-length MCATs was on average 20% to 25% shorter than test length across separate UCATs. This study showed that there are clear advantages of using MCAT rather than UCAT in a setting typical for health measurement.


1999 ◽  
Vol 15 (2) ◽  
pp. 91-98 ◽  
Author(s):  
Lutz F. Hornke

Summary: Item parameters for several hundreds of items were estimated based on empirical data from several thousands of subjects. The logistic one-parameter (1PL) and two-parameter (2PL) model estimates were evaluated. However, model fit showed that only a subset of items complied sufficiently, so that the remaining ones were assembled in well-fitting item banks. In several simulation studies 5000 simulated responses were generated in accordance with a computerized adaptive test procedure along with person parameters. A general reliability of .80 or a standard error of measurement of .44 was used as a stopping rule to end CAT testing. We also recorded how often each item was used by all simulees. Person-parameter estimates based on CAT correlated higher than .90 with true values simulated. For all 1PL fitting item banks most simulees used more than 20 items but less than 30 items to reach the pre-set level of measurement error. However, testing based on item banks that complied to the 2PL revealed that, on average, only 10 items were sufficient to end testing at the same measurement error level. Both clearly demonstrate the precision and economy of computerized adaptive testing. Empirical evaluations from everyday uses will show whether these trends will hold up in practice. If so, CAT will become possible and reasonable with some 150 well-calibrated 2PL items.


2020 ◽  
Vol 11 ◽  
Author(s):  
Xiaojian Sun ◽  
Yanlou Liu ◽  
Tao Xin ◽  
Naiqing Song

Calibration errors are inevitable and should not be ignored during the estimation of item parameters. Items with calibration error can affect the measurement results of tests. One of the purposes of the current study is to investigate the impacts of the calibration errors during the estimation of item parameters on the measurement accuracy, average test length, and test efficiency for variable-length cognitive diagnostic computerized adaptive testing. The other purpose is to examine the methods for reducing the adverse effects of calibration errors. Simulation results show that (1) calibration error has negative effect on the measurement accuracy for the deterministic input, noisy “and” gate (DINA) model, and the reduced reparameterized unified model; (2) the average test lengths is shorter, and the test efficiency is overestimated for items with calibration errors; (3) the compensatory reparameterized unified model (CRUM) is less affected by the calibration errors, and the classification accuracy, average test length, and test efficiency are slightly stable in the CRUM framework; (4) methods such as improving the quality of items, using large calibration sample to calibrate the parameters of items, as well as using cross-validation method can reduce the adverse effects of calibration errors on CD-CAT.


2010 ◽  
Vol 71 (1) ◽  
pp. 37-53 ◽  
Author(s):  
Seung W. Choi ◽  
Matthew W. Grady ◽  
Barbara G. Dodd

2008 ◽  
Vol 24 (1) ◽  
pp. 49-56 ◽  
Author(s):  
Wolfgang A. Rauch ◽  
Karl Schweizer ◽  
Helfried Moosbrugger

Abstract. In this study the psychometric properties of the Personal Optimism scale of the POSO-E questionnaire ( Schweizer & Koch, 2001 ) for the assessment of dispositional optimism are evaluated by applying Samejima's (1969) graded response model, a parametric item response theory (IRT) model for polytomous data. Model fit is extensively evaluated via fit checks on the lower-order margins of the contingency table of observed and expected responses and visual checks of fit plots comparing observed and expected category response functions. The model proves appropriate for the data; a small amount of misfit is interpreted in terms of previous research using other measures for optimism. Item parameters and information functions show that optimism can be measured accurately, especially at moderately low to middle levels of the latent trait scale, and particularly by the negatively worded items.


SAGE Open ◽  
2020 ◽  
Vol 10 (1) ◽  
pp. 215824401989904
Author(s):  
Wenyi Wang ◽  
Lihong Song ◽  
Teng Wang ◽  
Peng Gao ◽  
Jian Xiong

The purpose of this study is to investigate the relationship between the Shannon entropy procedure and the Jensen–Shannon divergence (JSD) that are used as item selection criteria in cognitive diagnostic computerized adaptive testing (CD-CAT). Because the JSD itself is defined by the Shannon entropy, we apply the well-known relationship between the JSD and Shannon entropy to establish a relationship between the item selection criteria that are based on these two measures. To understand the relationship between these two item selection criteria better, an alternative way is also provided. Theoretical derivations and empirical examples have shown that the Shannon entropy procedure and the JSD in CD-CAT have a linear relation under cognitive diagnostic models. Consistent with our theoretical conclusions, simulation results have shown that two item selection criteria behaved quite similarly in terms of attribute-level and pattern recovery rates under all conditions and they selected the same set of items for each examinee from an item bank with item parameters drawn from a uniform distribution U(0.1, 0.3) under post hoc simulations. We provide some suggestions for future studies and a discussion of relationship between the modified posterior-weighted Kullback–Leibler index and the G-DINA (generalized deterministic inputs, noisy “and” gate) discrimination index.


2017 ◽  
Vol 4 (1) ◽  
pp. e7 ◽  
Author(s):  
Tessa Magnée ◽  
Derek P de Beurs ◽  
Berend Terluin ◽  
Peter F Verhaak

Background Efficient screening questionnaires are useful in general practice. Computerized adaptive testing (CAT) is a method to improve the efficiency of questionnaires, as only the items that are particularly informative for a certain responder are dynamically selected. Objective The objective of this study was to test whether CAT could improve the efficiency of the Four-Dimensional Symptom Questionnaire (4DSQ), a frequently used self-report questionnaire designed to assess common psychosocial problems in general practice. Methods A simulation study was conducted using a sample of Dutch patients visiting a general practitioner (GP) with psychological problems (n=379). Responders completed a paper-and-pencil version of the 50-item 4DSQ and a psychometric evaluation was performed to check if the data agreed with item response theory (IRT) assumptions. Next, a CAT simulation was performed for each of the four 4DSQ scales (distress, depression, anxiety, and somatization), based on the given responses as if they had been collected through CAT. The following two stopping rules were applied for the administration of items: (1) stop if measurement precision is below a predefined level, or (2) stop if more than half of the items of the subscale are administered. Results In general, the items of each of the four scales agreed with IRT assumptions. Application of the first stopping rule reduced the length of the questionnaire by 38% (from 50 to 31 items on average). When the second stopping rule was also applied, the total number of items could be reduced by 56% (from 50 to 22 items on average). Conclusions CAT seems useful for improving the efficiency of the 4DSQ by 56% without losing a considerable amount of measurement precision. The CAT version of the 4DSQ may be useful as part of an online assessment to investigate the severity of mental health problems of patients visiting a GP. This simulation study is the first step needed for the development a CAT version of the 4DSQ. A CAT version of the 4DSQ could be of high value for Dutch GPs since increasing numbers of patients with mental health problems are visiting the general practice. In further research, the results of a real-time CAT should be compared with the results of the administration of the full scale.


2020 ◽  
pp. 014662162097768
Author(s):  
Miguel A. Sorrel ◽  
Francisco José Abad ◽  
Pablo Nájera

Decisions on how to calibrate an item bank might have major implications in the subsequent performance of the adaptive algorithms. One of these decisions is model selection, which can become problematic in the context of cognitive diagnosis computerized adaptive testing, given the wide range of models available. This article aims to determine whether model selection indices can be used to improve the performance of adaptive tests. Three factors were considered in a simulation study, that is, calibration sample size, Q-matrix complexity, and item bank length. Results based on the true item parameters, and general and single reduced model estimates were compared to those of the combination of appropriate models. The results indicate that fitting a single reduced model or a general model will not generally provide optimal results. Results based on the combination of models selected by the fit index were always closer to those obtained with the true item parameters. The implications for practical settings include an improvement in terms of classification accuracy and, consequently, testing time, and a more balanced use of the item bank. An R package was developed, named cdcatR, to facilitate adaptive applications in this context.


Sign in / Sign up

Export Citation Format

Share Document