Computerized Adaptive Testing of Personality Traits

A computerized adaptive testing (CAT) procedure was simulated with ordinal polytomous personality data collected using a conventional paper-and-pencil testing format. An adapted Dutch version of the dominance scale of Gough and Heilbrun’s Adjective Check List (ACL) was used. This version contained Likert response scales with five categories. Item parameters were estimated using Samejima’s graded response model from the responses of 1,925 subjects. The CAT procedure was simulated using the responses of 1,517 other subjects. The value of the required standard error in the stopping rule of the CAT was manipulated. The relationship between CAT latent trait estimates and estimates based on all dominance items was studied. Additionally, the pattern of relationships between the CAT latent trait estimates and the other ACL scales was compared to that between latent trait estimates based on the entire item pool and the other ACL scales. The CAT procedure resulted in latent trait estimates qualitatively equivalent to latent trait estimates based on all items, while a substantial reduction of the number of used items could be realized (at the stopping rule of 0.4 about 33% of the 36 items was used).

Download Full-text

Measuring Patient-Reported Outcomes Adaptively: Multidimensionality Matters!

Applied Psychological Measurement ◽

10.1177/0146621617733954 ◽

2017 ◽

Vol 42 (5) ◽

pp. 327-342 ◽

Cited By ~ 3

Author(s):

Muirne C. S. Paap ◽

Karel A. Kroeze ◽

Cees A. W. Glas ◽

Caroline B. Terwee ◽

Job van der Palen ◽

...

Keyword(s):

Computerized Adaptive Testing ◽

Latent Trait ◽

Variable Length ◽

Test Length ◽

Health Measurement ◽

Computerized Adaptive Test ◽

Incremental Value ◽

Item Parameters ◽

Graded Response ◽

Patient Reported

As there is currently a marked increase in the use of both unidimensional (UCAT) and multidimensional computerized adaptive testing (MCAT) in psychological and health measurement, the main aim of the present study is to assess the incremental value of using MCAT rather than separate UCATs for each dimension. Simulations are based on empirical data that could be considered typical for health measurement: a large number of dimensions (4), strong correlations among dimensions (.77-.87), and polytomously scored response data. Both variable- ( SE < .316, SE < .387) and fixed-length conditions (total test length of 12, 20, or 32 items) are studied. The item parameters and variance–covariance matrix Φ are estimated with the multidimensional graded response model (GRM). Outcome variables include computerized adaptive test (CAT) length, root mean square error (RMSE), and bias. Both simulated and empirical latent trait distributions are used to sample vectors of true scores. MCATs were generally more efficient (in terms of test length) and more accurate (in terms of RMSE) than their UCAT counterparts. Absolute average bias was highest for variable-length UCATs with termination rule SE < .387. Test length of variable-length MCATs was on average 20% to 25% shorter than test length across separate UCATs. This study showed that there are clear advantages of using MCAT rather than UCAT in a setting typical for health measurement.

Download Full-text

Benefits from Computerized Adaptive Testing as Seen in Simulation Studies

European Journal of Psychological Assessment ◽

10.1027//1015-5759.15.2.91 ◽

1999 ◽

Vol 15 (2) ◽

pp. 91-98 ◽

Cited By ~ 10

Author(s):

Lutz F. Hornke

Keyword(s):

Measurement Error ◽

Computerized Adaptive Testing ◽

Test Procedure ◽

Adaptive Testing ◽

Parameter Estimates ◽

Simulation Studies ◽

Computerized Adaptive Test ◽

Item Banks ◽

Item Parameters ◽

General Reliability

Summary: Item parameters for several hundreds of items were estimated based on empirical data from several thousands of subjects. The logistic one-parameter (1PL) and two-parameter (2PL) model estimates were evaluated. However, model fit showed that only a subset of items complied sufficiently, so that the remaining ones were assembled in well-fitting item banks. In several simulation studies 5000 simulated responses were generated in accordance with a computerized adaptive test procedure along with person parameters. A general reliability of .80 or a standard error of measurement of .44 was used as a stopping rule to end CAT testing. We also recorded how often each item was used by all simulees. Person-parameter estimates based on CAT correlated higher than .90 with true values simulated. For all 1PL fitting item banks most simulees used more than 20 items but less than 30 items to reach the pre-set level of measurement error. However, testing based on item banks that complied to the 2PL revealed that, on average, only 10 items were sufficient to end testing at the same measurement error level. Both clearly demonstrate the precision and economy of computerized adaptive testing. Empirical evaluations from everyday uses will show whether these trends will hold up in practice. If so, CAT will become possible and reasonable with some 150 well-calibrated 2PL items.

Download Full-text

Dynamic and Comprehensive Item Selection Strategies for Computerized Adaptive Testing Based on Graded Response Model

Acta Psychologica Sinica ◽

10.3724/sp.j.1041.2012.00400 ◽

2013 ◽

Vol 44 (3) ◽

pp. 400-412 ◽

Cited By ~ 1

Author(s):

Fen LUO ◽

Shu-Liang DING ◽

Xiao-Qing WANG

Keyword(s):

Computerized Adaptive Testing ◽

Adaptive Testing ◽

Item Selection ◽

Response Model ◽

Graded Response Model ◽

Selection Strategies ◽

Graded Response

Download Full-text

The Impact of Item Calibration Error on Variable-Length Cognitive Diagnostic Computerized Adaptive Testing

Frontiers in Psychology ◽

10.3389/fpsyg.2020.575141 ◽

2020 ◽

Vol 11 ◽

Author(s):

Xiaojian Sun ◽

Yanlou Liu ◽

Tao Xin ◽

Naiqing Song

Keyword(s):

Adverse Effects ◽

Measurement Accuracy ◽

Computerized Adaptive Testing ◽

Adaptive Testing ◽

Unified Model ◽

Variable Length ◽

Calibration Error ◽

Test Length ◽

Test Efficiency ◽

Item Parameters

Calibration errors are inevitable and should not be ignored during the estimation of item parameters. Items with calibration error can affect the measurement results of tests. One of the purposes of the current study is to investigate the impacts of the calibration errors during the estimation of item parameters on the measurement accuracy, average test length, and test efficiency for variable-length cognitive diagnostic computerized adaptive testing. The other purpose is to examine the methods for reducing the adverse effects of calibration errors. Simulation results show that (1) calibration error has negative effect on the measurement accuracy for the deterministic input, noisy “and” gate (DINA) model, and the reduced reparameterized unified model; (2) the average test lengths is shorter, and the test efficiency is overestimated for items with calibration errors; (3) the compensatory reparameterized unified model (CRUM) is less affected by the calibration errors, and the classification accuracy, average test length, and test efficiency are slightly stable in the CRUM framework; (4) methods such as improving the quality of items, using large calibration sample to calibrate the parameters of items, as well as using cross-validation method can reduce the adverse effects of calibration errors on CD-CAT.

Download Full-text

A New Stopping Rule for Computerized Adaptive Testing

Educational and Psychological Measurement ◽

10.1177/0013164410387338 ◽

2010 ◽

Vol 71 (1) ◽

pp. 37-53 ◽

Cited By ~ 21

Author(s):

Seung W. Choi ◽

Matthew W. Grady ◽

Barbara G. Dodd

Keyword(s):

Computerized Adaptive Testing ◽

Adaptive Testing ◽

Stopping Rule

Download Full-text

An IRT Analysis of the Personal Optimism Scale

European Journal of Psychological Assessment ◽

10.1027/1015-5759.24.1.49 ◽

2008 ◽

Vol 24 (1) ◽

pp. 49-56 ◽

Cited By ~ 16

Author(s):

Wolfgang A. Rauch ◽

Karl Schweizer ◽

Helfried Moosbrugger

Keyword(s):

Item Response ◽

Data Model ◽

Latent Trait ◽

Model Fit ◽

Response Functions ◽

Graded Response Model ◽

Irt Model ◽

Item Parameters ◽

Graded Response ◽

Personal Optimism

Abstract. In this study the psychometric properties of the Personal Optimism scale of the POSO-E questionnaire ( Schweizer & Koch, 2001 ) for the assessment of dispositional optimism are evaluated by applying Samejima's (1969) graded response model, a parametric item response theory (IRT) model for polytomous data. Model fit is extensively evaluated via fit checks on the lower-order margins of the contingency table of observed and expected responses and visual checks of fit plots comparing observed and expected category response functions. The model proves appropriate for the data; a small amount of misfit is interpreted in terms of previous research using other measures for optimism. Item parameters and information functions show that optimism can be measured accurately, especially at moderately low to middle levels of the latent trait scale, and particularly by the negatively worded items.

Download Full-text

A Note on the Relationship of the Shannon Entropy Procedure and the Jensen–Shannon Divergence in Cognitive Diagnostic Computerized Adaptive Testing

SAGE Open ◽

10.1177/2158244019899046 ◽

2020 ◽

Vol 10 (1) ◽

pp. 215824401989904

Author(s):

Wenyi Wang ◽

Lihong Song ◽

Teng Wang ◽

Peng Gao ◽

Jian Xiong

Keyword(s):

Shannon Entropy ◽

Selection Criteria ◽

Computerized Adaptive Testing ◽

Adaptive Testing ◽

Item Selection ◽

Item Parameters ◽

Post Hoc ◽

Relationship Of ◽

Jensen Shannon Divergence ◽

The Relationship

The purpose of this study is to investigate the relationship between the Shannon entropy procedure and the Jensen–Shannon divergence (JSD) that are used as item selection criteria in cognitive diagnostic computerized adaptive testing (CD-CAT). Because the JSD itself is defined by the Shannon entropy, we apply the well-known relationship between the JSD and Shannon entropy to establish a relationship between the item selection criteria that are based on these two measures. To understand the relationship between these two item selection criteria better, an alternative way is also provided. Theoretical derivations and empirical examples have shown that the Shannon entropy procedure and the JSD in CD-CAT have a linear relation under cognitive diagnostic models. Consistent with our theoretical conclusions, simulation results have shown that two item selection criteria behaved quite similarly in terms of attribute-level and pattern recovery rates under all conditions and they selected the same set of items for each examinee from an item bank with item parameters drawn from a uniform distribution U(0.1, 0.3) under post hoc simulations. We provide some suggestions for future studies and a discussion of relationship between the modified posterior-weighted Kullback–Leibler index and the G-DINA (generalized deterministic inputs, noisy “and” gate) discrimination index.

Download Full-text

A Comparison of the Partial Credit and Graded Response Models in Computerized Adaptive Testing

Applied Measurement in Education ◽

10.1207/s15324818ame0501_2 ◽

1992 ◽

Vol 5 (1) ◽

pp. 17-34 ◽

Cited By ~ 11

Author(s):

R.J. De Ayala ◽

Barbara G. Dodd ◽

William R. Koch

Keyword(s):

Computerized Adaptive Testing ◽

Adaptive Testing ◽

Partial Credit ◽

Response Models ◽

Graded Response

Download Full-text

Applying Computerized Adaptive Testing to the Four-Dimensional Symptom Questionnaire (4DSQ): A Simulation Study

JMIR Mental Health ◽

10.2196/mental.6545 ◽

2017 ◽

Vol 4 (1) ◽

pp. e7 ◽

Cited By ~ 1

Author(s):

Tessa Magnée ◽

Derek P de Beurs ◽

Berend Terluin ◽

Peter F Verhaak

Keyword(s):

Mental Health ◽

General Practice ◽

Simulation Study ◽

Computerized Adaptive Testing ◽

Mental Health Problems ◽

Adaptive Testing ◽

Health Problems ◽

Measurement Precision ◽

Stopping Rule ◽

Symptom Questionnaire

Background Efficient screening questionnaires are useful in general practice. Computerized adaptive testing (CAT) is a method to improve the efficiency of questionnaires, as only the items that are particularly informative for a certain responder are dynamically selected. Objective The objective of this study was to test whether CAT could improve the efficiency of the Four-Dimensional Symptom Questionnaire (4DSQ), a frequently used self-report questionnaire designed to assess common psychosocial problems in general practice. Methods A simulation study was conducted using a sample of Dutch patients visiting a general practitioner (GP) with psychological problems (n=379). Responders completed a paper-and-pencil version of the 50-item 4DSQ and a psychometric evaluation was performed to check if the data agreed with item response theory (IRT) assumptions. Next, a CAT simulation was performed for each of the four 4DSQ scales (distress, depression, anxiety, and somatization), based on the given responses as if they had been collected through CAT. The following two stopping rules were applied for the administration of items: (1) stop if measurement precision is below a predefined level, or (2) stop if more than half of the items of the subscale are administered. Results In general, the items of each of the four scales agreed with IRT assumptions. Application of the first stopping rule reduced the length of the questionnaire by 38% (from 50 to 31 items on average). When the second stopping rule was also applied, the total number of items could be reduced by 56% (from 50 to 22 items on average). Conclusions CAT seems useful for improving the efficiency of the 4DSQ by 56% without losing a considerable amount of measurement precision. The CAT version of the 4DSQ may be useful as part of an online assessment to investigate the severity of mental health problems of patients visiting a GP. This simulation study is the first step needed for the development a CAT version of the 4DSQ. A CAT version of the 4DSQ could be of high value for Dutch GPs since increasing numbers of patients with mental health problems are visiting the general practice. In further research, the results of a real-time CAT should be compared with the results of the administration of the full scale.

Download Full-text

Improving Accuracy and Usage by Correctly Selecting: The Effects of Model Selection in Cognitive Diagnosis Computerized Adaptive Testing

Applied Psychological Measurement ◽

10.1177/0146621620977682 ◽

2020 ◽

pp. 014662162097768

Author(s):

Miguel A. Sorrel ◽

Francisco José Abad ◽

Pablo Nájera

Keyword(s):

Model Selection ◽

Computerized Adaptive Testing ◽

Adaptive Testing ◽

Item Bank ◽

Cognitive Diagnosis ◽

Reduced Model ◽

Testing Time ◽

Subsequent Performance ◽

Item Parameters ◽

Wide Range

Decisions on how to calibrate an item bank might have major implications in the subsequent performance of the adaptive algorithms. One of these decisions is model selection, which can become problematic in the context of cognitive diagnosis computerized adaptive testing, given the wide range of models available. This article aims to determine whether model selection indices can be used to improve the performance of adaptive tests. Three factors were considered in a simulation study, that is, calibration sample size, Q-matrix complexity, and item bank length. Results based on the true item parameters, and general and single reduced model estimates were compared to those of the combination of appropriate models. The results indicate that fitting a single reduced model or a general model will not generally provide optimal results. Results based on the combination of models selected by the fit index were always closer to those obtained with the true item parameters. The implications for practical settings include an improvement in terms of classification accuracy and, consequently, testing time, and a more balanced use of the item bank. An R package was developed, named cdcatR, to facilitate adaptive applications in this context.

Download Full-text