To Implement Computerized Adaptive Testing by Automatically Adjusting Item Difficulty Index on Adaptive English Learning Platform

When computerized adaptive testing (CAT) is under stringent item exposure control, the precision of trait estimation will substantially decrease. A new item selection method, the dynamic Stratification method based on Dominance Curves (SDC), which is aimed at improving trait estimation, is proposed to mitigate this problem. The objective function of the SDC in item selection is to maximize the sum of test information for all examinees rather than maximizing item information for individual examinees at a single-item administration, as in conventional CAT. To achieve this objective, the SDC uses dominance curves to stratify an item pool into strata with the number being equal to the test length to precisely and accurately increase the quality of the administered items as the test progresses, reducing the likelihood that a high-discrimination item will be administered to an examinee whose ability is not close to the item difficulty. Furthermore, the SDC incorporates a dynamic process for on-the-fly item–stratum adjustment to optimize the use of quality items. Simulation studies were conducted to investigate the performance of the SDC in CAT under item exposure control at different levels of severity. According to the results, the SDC can efficiently improve trait estimation in CAT through greater precision and more accurate trait estimation than those generated by other methods (e.g., the maximum Fisher information method) in most conditions.

Download Full-text

Consequences of Test Anxiety on Adaptive Versus Fixed Item Testing

European Journal of Psychological Assessment ◽

10.1027/1015-5759/a000062 ◽

2011 ◽

Vol 27 (3) ◽

pp. 157-163 ◽

Cited By ~ 13

Author(s):

Tuulia M. Ortner ◽

Juliane Caspers

Keyword(s):

Test Anxiety ◽

Test Performance ◽

Computerized Adaptive Testing ◽

Item Difficulty ◽

Short Form ◽

Adaptive Testing ◽

Standard Test ◽

Test Form ◽

Test Mode ◽

Modern School

We investigated the effects of test anxiety on test performance using computerized adaptive testing (CAT) versus conventional fixed item testing (FIT). We hypothesized that tests containing mainly items with medium probabilities of being solved would have negative effects on test performance for testtakers high in test anxiety. A total of 110 students (aged 16 to 20) from a German secondary modern school filled out a short form of the Test Anxiety Inventory (TAI-G; Wacker, Jaunzeme, & Jaksztat, 2008 ) and then were presented with items from the Adaptive Matrices Test (AMT; Hornke, Etzel, & Rettig, 1999 ) on the computer, either in CAT form or in a fixed item test form with a selection of items arranged in order of increasing item difficulty. Additionally, half of the students were given a short summary of information about the mode of item selection in adaptive testing before working on the CAT. In a moderated regression approach, a significant interaction of test anxiety and test mode was revealed. The effect of test mode on the AMT score was stronger for students with higher scores on test anxiety than for students with lower test anxiety. Furthermore, getting information about CAT led to significantly better results than receiving standard test instructions. Results are discussed with reference to test fairness.

Download Full-text

On the complementarity of classical test theory and item response models: item difficulty estimates and computerized adaptive testing

Ensaio Avaliação e Políticas Públicas em Educação ◽

10.1590/s0104-40362015000300003 ◽

2015 ◽

Vol 23 (88) ◽

pp. 593-610

Author(s):

Patrícia Costa ◽

Maria Eugénia Ferrão

Keyword(s):

Item Response ◽

Computerized Adaptive Testing ◽

Item Difficulty ◽

Classical Test Theory ◽

Adaptive Testing ◽

Test Theory ◽

Partial Credit Model ◽

Response Models ◽

Item Response Models ◽

Classical Test

This study aims to provide statistical evidence of the complementarity between classical test theory and item response models for certain educational assessment purposes. Such complementarity might support, at a reduced cost, future development of innovative procedures for item calibration in adaptive testing. Classical test theory and the generalized partial credit model are applied to tests comprising multiple choice, short answer, completion, and open response items scored partially. Datasets are derived from the tests administered to the Portuguese population of students enrolled in the 4th and 6th grades. The results show a very strong association between the estimates of difficulty obtained from classical test theory and item response models, corroborating the statistical theory of mental testing.

Download Full-text

Examinee Judgments of Changes in Item Difficulty: Implications for Item Review in Computerized Adaptive Testing

Applied Measurement in Education ◽

10.1207/s15324818ame1202_5 ◽

1999 ◽

Vol 12 (2) ◽

pp. 185-198 ◽

Cited By ~ 9

Author(s):

Steven L. Wise ◽

Sara J. Finney ◽

Craig K. Enders ◽

Sharon A. Freeman ◽

Donald D. Severance

Keyword(s):

Computerized Adaptive Testing ◽

Item Difficulty ◽

Adaptive Testing

Download Full-text

Item difficulty parameter estimation using the idea of the graded response model and computerized adaptive testing

Japanese Psychological Research ◽

10.1111/j.1468-5884.2009.00383.x ◽

2009 ◽

Vol 51 (1) ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

KOKEN OZAKI ◽

HIDEKI TOYODA

Keyword(s):

Parameter Estimation ◽

Computerized Adaptive Testing ◽

Item Difficulty ◽

Adaptive Testing ◽

Response Model ◽

Graded Response Model ◽

Graded Response

Download Full-text

Generative Adaptive Testing with Digit Span Items

10.31234/osf.io/prs4x ◽

2020 ◽

Author(s):

John Harmon Wolfe ◽

Gerald E. Larson

Keyword(s):

Real Time ◽

Linear Function ◽

San Diego ◽

Computerized Adaptive Testing ◽

Item Difficulty ◽

Digit Span ◽

Adaptive Testing ◽

Training Center

The feasibility of generating items in real-time for computerized adaptive testing is explored, using forward digit span as an exemplar. A sample of 531 recruits at the Naval Training Center in San Diego were administered 36 computer-generated forward digit span items of varying lengths. Calibrations showed that the item difficulty was a simple linear function of the number of digits in the item, thus making the difficulty of newly generated items predictable. Simulations of Computerized adaptive testing with the approach were conducted with favorable results.

Download Full-text

Benefits from Computerized Adaptive Testing as Seen in Simulation Studies

European Journal of Psychological Assessment ◽

10.1027//1015-5759.15.2.91 ◽

1999 ◽

Vol 15 (2) ◽

pp. 91-98 ◽

Cited By ~ 10

Author(s):

Lutz F. Hornke

Keyword(s):

Measurement Error ◽

Computerized Adaptive Testing ◽

Test Procedure ◽

Adaptive Testing ◽

Parameter Estimates ◽

Simulation Studies ◽

Computerized Adaptive Test ◽

Item Banks ◽

Item Parameters ◽

General Reliability

Summary: Item parameters for several hundreds of items were estimated based on empirical data from several thousands of subjects. The logistic one-parameter (1PL) and two-parameter (2PL) model estimates were evaluated. However, model fit showed that only a subset of items complied sufficiently, so that the remaining ones were assembled in well-fitting item banks. In several simulation studies 5000 simulated responses were generated in accordance with a computerized adaptive test procedure along with person parameters. A general reliability of .80 or a standard error of measurement of .44 was used as a stopping rule to end CAT testing. We also recorded how often each item was used by all simulees. Person-parameter estimates based on CAT correlated higher than .90 with true values simulated. For all 1PL fitting item banks most simulees used more than 20 items but less than 30 items to reach the pre-set level of measurement error. However, testing based on item banks that complied to the 2PL revealed that, on average, only 10 items were sufficient to end testing at the same measurement error level. Both clearly demonstrate the precision and economy of computerized adaptive testing. Empirical evaluations from everyday uses will show whether these trends will hold up in practice. If so, CAT will become possible and reasonable with some 150 well-calibrated 2PL items.

Download Full-text

Methods for Restricting Maximum Exposure Rate in Computerized Adaptative Testing

Methodology ◽

10.1027/1614-2241.3.1.14 ◽

2007 ◽

Vol 3 (1) ◽

pp. 14-23 ◽

Cited By ~ 9

Author(s):

Juan Ramon Barrada ◽

Julio Olea ◽

Vicente Ponsoda

Keyword(s):

Measurement Accuracy ◽

Computerized Adaptive Testing ◽

Computation Time ◽

Adaptive Testing ◽

Exposure Rate ◽

Control Parameters ◽

The Impact ◽

Two Alternatives ◽

Selection Of ◽

Maximum Exposure

Abstract. The Sympson-Hetter (1985) method provides a means of controlling maximum exposure rate of items in Computerized Adaptive Testing. Through a series of simulations, control parameters are set that mark the probability of administration of an item on being selected. This method presents two main problems: it requires a long computation time for calculating the parameters and the maximum exposure rate is slightly above the fixed limit. Van der Linden (2003) presented two alternatives which appear to solve both of the problems. The impact of these methods in the measurement accuracy has not been tested yet. We show how these methods over-restrict the exposure of some highly discriminating items and, thus, the accuracy is decreased. It also shown that, when the desired maximum exposure rate is near the minimum possible value, these methods offer an empirical maximum exposure rate clearly above the goal. A new method, based on the initial estimation of the probability of administration and the probability of selection of the items with the restricted method ( Revuelta & Ponsoda, 1998 ), is presented in this paper. It can be used with the Sympson-Hetter method and with the two van der Linden's methods. This option, when used with Sympson-Hetter, speeds the convergence of the control parameters without decreasing the accuracy.

Download Full-text