scholarly journals Rotating Item Banks versus Restriction of Maximum Exposure Rates in Computerized Adaptive Testing

2008 ◽  
Vol 11 (2) ◽  
pp. 618-625 ◽  
Author(s):  
Juan Ramón Barrada ◽  
Julio Olea ◽  
Francisco José Abad

If examinees were to know, beforehand, part of the content of a computerized adaptive test, their estimated trait levels would then have a marked positive bias. One of the strategies to avoid this consists of dividing a large item bank into several sub-banks and rotating the sub-bank employed (Ariel, Veldkamp & van der Linden, 2004). This strategy permits substantial improvements in exposure control at little cost to measurement accuracy. However, we do not know whether this option provides better results than using the master bank with greater restriction in the maximum exposure rates (Sympson & Hetter, 1985). In order to investigate this issue, we worked with several simulated banks of 2100 items, comparing them, for RMSE and overlap rate, with the same banks divided in two, three… up to seven sub-banks. By means of extensive manipulation of the maximum exposure rate in each bank, we found that the option of rotating banks slightly outperformed the option of restricting maximum exposure rate of the master bank by means of the Sympson-Hetter method.

Methodology ◽  
2007 ◽  
Vol 3 (1) ◽  
pp. 14-23 ◽  
Author(s):  
Juan Ramon Barrada ◽  
Julio Olea ◽  
Vicente Ponsoda

Abstract. The Sympson-Hetter (1985) method provides a means of controlling maximum exposure rate of items in Computerized Adaptive Testing. Through a series of simulations, control parameters are set that mark the probability of administration of an item on being selected. This method presents two main problems: it requires a long computation time for calculating the parameters and the maximum exposure rate is slightly above the fixed limit. Van der Linden (2003) presented two alternatives which appear to solve both of the problems. The impact of these methods in the measurement accuracy has not been tested yet. We show how these methods over-restrict the exposure of some highly discriminating items and, thus, the accuracy is decreased. It also shown that, when the desired maximum exposure rate is near the minimum possible value, these methods offer an empirical maximum exposure rate clearly above the goal. A new method, based on the initial estimation of the probability of administration and the probability of selection of the items with the restricted method ( Revuelta & Ponsoda, 1998 ), is presented in this paper. It can be used with the Sympson-Hetter method and with the two van der Linden's methods. This option, when used with Sympson-Hetter, speeds the convergence of the control parameters without decreasing the accuracy.


2009 ◽  
Vol 89 (6) ◽  
pp. 589-600 ◽  
Author(s):  
Stephen M. Haley ◽  
Maria A. Fragala-Pinkham ◽  
Helene M. Dumas ◽  
Pengsheng Ni ◽  
George E. Gorton ◽  
...  

Background: Contemporary clinical assessments of activity are needed across the age span for children with cerebral palsy (CP). Computerized adaptive testing (CAT) has the potential to efficiently administer items for children across wide age spans and functional levels. Objective: The objective of this study was to examine the psychometric properties of a new item bank and simulated computerized adaptive test to assess activity level abilities in children with CP. Design: This was a cross-sectional item calibration study. Methods: The convenience sample consisted of 308 children and youth with CP, aged 2 to 20 years (X=10.7, SD=4.0), recruited from 4 pediatric hospitals. We collected parent-report data on an initial set of 45 activity items. Using an Item Response Theory (IRT) approach, we compared estimated scores from the activity item bank with concurrent instruments, examined discriminate validity, and developed computer simulations of a CAT algorithm with multiple stop rules to evaluate scale coverage, score agreement with CAT algorithms, and discriminant and concurrent validity. Results: Confirmatory factor analysis supported scale unidimensionality, local item dependence, and invariance. Scores from the computer simulations of the prototype CATs with varying stop rules were consistent with scores from the full item bank (r=.93–.98). The activity summary scores discriminated across levels of upper-extremity and gross motor severity and were correlated with the Pediatric Outcomes Data Collection Instrument (PODCI) physical function and sports subscale (r=.86), the Functional Independence Measure for Children (Wee-FIM) (r=.79), and the Pediatric Quality of Life Inventory–Cerebral Palsy version (r=.74). Limitations: The sample size was small for such IRT item banks and CAT development studies. Another limitation was oversampling of children with CP at higher functioning levels. Conclusions: The new activity item bank appears to have promise for use in a CAT application for the assessment of activity abilities in children with CP across a wide age range and different levels of motor severity.


1989 ◽  
Vol 5 (3) ◽  
pp. 311-326 ◽  
Author(s):  
James B. Olsen ◽  
Dennis D. Maynes ◽  
Dean Slawson ◽  
Kevin Ho

This research study was designed to compare student achievement scores from three different testing methods: paper-administered testing, computer-administered testing, and computerized adaptive testing. The three testing formats were developed from the California Assessment Program (CAP) item banks for grades three and six. The paper-administered and the computer-administered tests were identical in item content, format, and sequence. The computerized adaptive test was a tailored or adaptive sequence of the items in the computer-administered test.


1999 ◽  
Vol 15 (2) ◽  
pp. 91-98 ◽  
Author(s):  
Lutz F. Hornke

Summary: Item parameters for several hundreds of items were estimated based on empirical data from several thousands of subjects. The logistic one-parameter (1PL) and two-parameter (2PL) model estimates were evaluated. However, model fit showed that only a subset of items complied sufficiently, so that the remaining ones were assembled in well-fitting item banks. In several simulation studies 5000 simulated responses were generated in accordance with a computerized adaptive test procedure along with person parameters. A general reliability of .80 or a standard error of measurement of .44 was used as a stopping rule to end CAT testing. We also recorded how often each item was used by all simulees. Person-parameter estimates based on CAT correlated higher than .90 with true values simulated. For all 1PL fitting item banks most simulees used more than 20 items but less than 30 items to reach the pre-set level of measurement error. However, testing based on item banks that complied to the 2PL revealed that, on average, only 10 items were sufficient to end testing at the same measurement error level. Both clearly demonstrate the precision and economy of computerized adaptive testing. Empirical evaluations from everyday uses will show whether these trends will hold up in practice. If so, CAT will become possible and reasonable with some 150 well-calibrated 2PL items.


2021 ◽  
Author(s):  
Bryant A Seamon ◽  
Steven A Kautz ◽  
Craig A Velozo

Abstract Objective Administrative burden often prevents clinical assessment of balance confidence in people with stroke. A computerized adaptive test (CAT) version of the Activities-specific Balance Confidence Scale (ABC CAT) can dramatically reduce this burden. The objective of this study was to test balance confidence measurement precision and efficiency in people with stroke with an ABC CAT. Methods We conducted a retrospective cross-sectional simulation study with data from 406 adults approximately 2-months post-stroke in the Locomotor-Experience Applied Post-Stroke (LEAPS) trial. Item parameters for CAT calibration were estimated with the Rasch model using a random sample of participants (n = 203). Computer simulation was used with response data from remaining 203 participants to evaluate the ABC CAT algorithm under varying stopping criteria. We compared estimated levels of balance confidence from each simulation to actual levels predicted from the Rasch model (Pearson correlations and mean standard error (SE)). Results Results from simulations with number of items as a stopping criterion strongly correlated with actual ABC scores (full item, r = 1, 12-item, r = 0.994; 8-item, r = 0.98; 4-item, r = 0.929). Mean SE increased with decreasing number of items administered (full item, SE = 0.31; 12-item, SE = 0.33; 8-item, SE = 0.38; 4-item, SE = 0.49). A precision-based stopping rule (mean SE = 0.5) also strongly correlated with actual ABC scores (r = .941) and optimized the relationship between number of items administrated with precision (mean number of items 4.37, range [4–9]). Conclusions An ABC CAT can determine accurate and precise measures of balance confidence in people with stroke with as few as 4 items. Individuals with lower balance confidence may require a greater number of items (up to 9) and attributed to the LEAPS trial excluding more functionally impaired persons. Impact Statement Computerized adaptive testing can drastically reduce the ABC’s test administration time while maintaining accuracy and precision. This should greatly enhance clinical utility, facilitating adoption of clinical practice guidelines in stroke rehabilitation. Lay Summary If you have had a stroke, your physical therapist will likely test your balance confidence. A computerized adaptive test version of the ABC scale can accurately identify balance with as few as 4 questions, which takes much less time.


2013 ◽  
Vol 93 (5) ◽  
pp. 681-693 ◽  
Author(s):  
I-Ping Hsueh ◽  
Jyun-Hong Chen ◽  
Chun-Hou Wang ◽  
Wen-Hsuan Hou ◽  
Ching-Lin Hsieh

Background An efficient, reliable, and valid measure for assessing activities of daily living (ADL) function is useful to improve the efficiency of patient management and outcome measurement. Objective The purpose of this study was to construct a computerized adaptive testing (CAT) system for measuring ADL function in outpatients with stroke. Design Two cohort studies were conducted at 6 hospitals in Taiwan. Methods A candidate item bank (44 items) was developed, and 643 outpatients were interviewed. An item response theory model was fitted to the data and estimated the item parameters (eg, difficulty and discrimination) for developing the ADL CAT. Another sample of 51 outpatients was interviewed to examine the concurrent validity and efficiency of the CAT. The ADL CAT, as the outcome measure, and the Barthel index (BI) and Frenchay Activities index (FAI) were administered on the second group of participants. Results Ten items did not satisfy the model's expectations and were deleted. Thirty-four items were included in the final item bank. Two stopping rules (ie, reliability coefficient >.9 and maximum test length of 7 items) were set for the CAT. The participants' ADL scores had an average reliability of .93. The CAT scores were highly associated with those of the full 34 items (Pearson r=.98). The scores of the CAT were closely correlated with those of the combined BI and FAI (r=.82). The time required to complete the CAT was about one fifth of the time used to administer both the BI and FAI. Limitations The participants were outpatients living in the community. Further studies are needed to cross-validate the results. Conclusions The results demonstrated that the ADL CAT is quick to administer, reliable, and valid in outpatients with stroke.


2015 ◽  
Vol 25 (1) ◽  
pp. 1-11 ◽  
Author(s):  
Morten Aa. Petersen ◽  
Neil K. Aaronson ◽  
Wei-Chu Chie ◽  
Thierry Conroy ◽  
Anna Costantini ◽  
...  

1995 ◽  
Vol 13 (2) ◽  
pp. 151-162 ◽  
Author(s):  
Mary E. Lunz ◽  
Betty Bergstrom

Computerized adaptive testing (CAT) uses a computer algorithm to construct and score the best possible individualized or tailored tests for each candidate. The computer also provides an absolute record of all responses and changes to responses, as well as their effects on candidate performance. The detail of the data from computerized adaptive tests makes it possible to track initial responses and response alterations, and their effect on candidate estimated ability measures, as well as the statistical performance of the examination. The purpose of this study was to track the effect of candidate response patterns on a computerized adaptive test. A ninety-item certification examination was divided into nine units of ten items each to track the pattern of initial responses and response alterations on ability estimates and test precision across the nine test units. The precision of the test was affected most by response alterations during early segments of the test. While generally, candidates benefit from altering responses, individual candidates showed different patterns of response alterations across test segments. Test precision is minimally affected, suggesting that the tailoring of CAT is minimally affected by response alterations.


Assessment ◽  
2017 ◽  
Vol 26 (7) ◽  
pp. 1362-1374 ◽  
Author(s):  
Gerard Flens ◽  
Niels Smits ◽  
Caroline B. Terwee ◽  
Joost Dekker ◽  
Irma Huijbrechts ◽  
...  

We used the Dutch–Flemish version of the USA PROMIS adult V1.0 item bank for Anxiety as input for developing a computerized adaptive test (CAT) to measure the entire latent anxiety continuum. First, psychometric analysis of a combined clinical and general population sample ( N = 2,010) showed that the 29-item bank has psychometric properties that are required for a CAT administration. Second, a post hoc CAT simulation showed efficient and highly precise measurement, with an average number of 8.64 items for the clinical sample, and 9.48 items for the general population sample. Furthermore, the accuracy of our CAT version was highly similar to that of the full item bank administration, both in final score estimates and in distinguishing clinical subjects from persons without a mental health disorder. We discuss the future directions and limitations of CAT development with the Dutch–Flemish version of the PROMIS Anxiety item bank.


Sign in / Sign up

Export Citation Format

Share Document