Hybrid Threshold-Based Sequential Procedures for Detecting Compromised Items in a Computerized Adaptive Testing Licensure Exam

2021 ◽  
pp. 001316442110238
Author(s):  
Chansoon Lee ◽  
Hong Qian

Using classical test theory and item response theory, this study applied sequential procedures to a real operational item pool in a variable-length computerized adaptive testing (CAT) to detect items whose security may be compromised. Moreover, this study proposed a hybrid threshold approach to improve the detection power of the sequential procedure while controlling the Type I error rate. The hybrid threshold approach uses a local threshold for each item in an early stage of the CAT administration, and then it uses the global threshold in the decision-making stage. Applying various simulation factors, a series of simulation studies examined which factors contribute significantly to the power rate and lag time of the procedure. In addition to the simulation study, a case study investigated whether the procedures are applicable to the real item pool administered in CAT and can identify potentially compromised items in the pool. This research found that the increment of probability of a correct answer ( p-increment) was the simulation factor most important to the sequential procedures’ ability to detect compromised items. This study also found that the local threshold approach improved power rates and shortened lag times when the p-increment was small. The findings of this study could help practitioners implement the sequential procedures using the hybrid threshold approach in real-time CAT administration.

2021 ◽  
pp. 073428292110277
Author(s):  
Ioannis Tsaousis ◽  
Georgios D. Sideridis ◽  
Hannan M. AlGhamdi

This study evaluated the psychometric quality of a computerized adaptive testing (CAT) version of the general cognitive ability test (GCAT), using a simulation study protocol put forth by Han, K. T. (2018a). For the needs of the analysis, three different sets of items were generated, providing an item pool of 165 items. Before evaluating the efficiency of the GCAT, all items in the final item pool were linked (equated), following a sequential approach. Data were generated using a standard normal for 10,000 virtual individuals ( M = 0 and SD = 1). Using the measure’s 165-item bank, the ability value (θ) for each participant was estimated. maximum Fisher information (MFI) and maximum likelihood estimation with fences (MLEF) were used as item selection and score estimation methods, respectively. For item exposure control, the fade away method (FAM) was preferred. The termination criterion involved a minimum SE ≤ 0.33. The study revealed that the average number of items administered for 10,000 participants was 15. Moreover, the precision level in estimating the participant’s ability score was very high, as demonstrated by the CBIAS, CMAE, and CRMSE). It is concluded that the CAT version of the test is a promising alternative to administering the corresponding full-length measure since it reduces the number of administered items, prevents high rates of item exposure, and provides accurate scores with minimum measurement error.


2018 ◽  
Vol 28 (8) ◽  
pp. 2385-2403 ◽  
Author(s):  
Tobias Mütze ◽  
Ekkehard Glimm ◽  
Heinz Schmidli ◽  
Tim Friede

Robust semiparametric models for recurrent events have received increasing attention in the analysis of clinical trials in a variety of diseases including chronic heart failure. In comparison to parametric recurrent event models, robust semiparametric models are more flexible in that neither the baseline event rate nor the process inducing between-patient heterogeneity needs to be specified in terms of a specific parametric statistical model. However, implementing group sequential designs in the robust semiparametric model is complicated by the fact that the sequence of Wald statistics does not follow asymptotically the canonical joint distribution. In this manuscript, we propose two types of group sequential procedures for a robust semiparametric analysis of recurrent events. The first group sequential procedure is based on the asymptotic covariance of the sequence of Wald statistics and it guarantees asymptotic control of the type I error rate. The second procedure is based on the canonical joint distribution and does not guarantee asymptotic type I error rate control but is easy to implement and corresponds to the well-known standard approach for group sequential designs. Moreover, we describe how to determine the maximum information when planning a clinical trial with a group sequential design and a robust semiparametric analysis of recurrent events. We contrast the operating characteristics of the proposed group sequential procedures in a simulation study motivated by the ongoing phase 3 PARAGON-HF trial (ClinicalTrials.gov identifier: NCT01920711) in more than 4600 patients with chronic heart failure and a preserved ejection fraction. We found that both group sequential procedures have similar operating characteristics and that for some practically relevant scenarios, the group sequential procedure based on the canonical joint distribution has advantages with respect to the control of the type I error rate. The proposed method for calculating the maximum information results in appropriately powered trials for both procedures.


2020 ◽  
Author(s):  
Menghua She ◽  
Yaling Li ◽  
Dongbo Tu ◽  
Yan Cai

Abstract Background: As more and more people suffer from sleep disorders, developing an efficient, cheap and accurate assessment tool for screening sleep disorders is becoming more urgent. This study developed a computerized adaptive testing for sleep disorders (CAT-SD). Methods: A large sample of 1,304 participants was recruited to construct the item pool of CAT-SD and to investigate the psychometric characteristics of CAT-SD. More specifically, firstly the analyses of unidimensionality, model fit, item fit, item discrimination parameter and differential item functioning (DIF) were conducted to construct a final item pool which meets the requirements of item response theory (IRT) measurement. In addition, a simulated CAT study with real response data of participants was performed to investigate the psychometric characteristics of CAT-SD, including reliability, validity and predictive utility (sensitivity and specificity). Results: The final unidimensional item bank of the CAT-SD not only had good item fit, high discrimination and no DIF; Moreover, it had acceptable reliability, validity and predictive utility. Conclusions: The CAT-SD could be used as an effective and accurate assessment tool for measuring individuals' severity of the sleep disorders and offers a bran-new perspective for screening of sleep disorders with psychological scales.


Author(s):  
Shu-Ching Ma ◽  
Willy Chou ◽  
Tsair-Wei Chien ◽  
Julie Chi Chow ◽  
Yu-Tsen Yeh ◽  
...  

BACKGROUND Workplace bullying has been measured in many studies to investigate its effects on mental health issues. However, none have used web-based computerized adaptive testing (CAT) with bully classifications and convolutional neural networks (CNN) for reporting the extent of individual bullying in the workplace. OBJECTIVE This study aims to build a model using CNN to develop an app for automatic detection and classification of nurse bullying-levels, incorporated with online Rasch computerized adaptive testing, to help assess nurse bullying at an earlier stage. METHODS We recruited 960 nurses working in a Taiwan Ch-Mei hospital group to fill out the 22-item Negative Acts Questionnaire-Revised (NAQ-R) in August 2012. The k-mean and the CNN were used as unsupervised and supervised learnings, respectively, for: (1) dividing nurses into three classes (n=918, 29, and 13 with suspicious mild, moderate, and severe extent of being bullied, respectively); and (2) building a bully prediction model to estimate 69 different parameters. Finally, data were separated into training and testing sets in a proportion of 70:30, where the former was used to predict the latter. We calculated the sensitivity, specificity, and receiver operating characteristic curve (area under the curve [AUC]), along with the accuracy across studies for comparison. An app predicting the respondent bullying-level was developed, involving the model’s 69 estimated parameters and the online Rasch CAT module as a website assessment. RESULTS We observed that: (1) the 22-item model yields higher accuracy rates for three categories, with an accuracy of 94% for the total 960 cases, and accuracies of 99% (AUC 0.99; 95% CI 0.99-1.00) and 83% (AUC 0.94; 95% CI 0.82-0.99) for the lower and upper groups (cutoff points at 49 and 66 points) based on the 947 cases and 42 cases, respectively; and (2) the 700-case training set, with 95% accuracy, predicts the 260-case testing set reaching an accuracy of 97. Thus, a NAQ-R app for nurses that predicts bullying-level was successfully developed and demonstrated in this study. CONCLUSIONS The 22-item CNN model, combined with the Rasch online CAT, is recommended for improving the accuracy of the nurse NAQ-R assessment. An app developed for helping nurses self-assess workplace bullying at an early stage is required for application in the future.


1998 ◽  
Vol 23 (4) ◽  
pp. 291-322 ◽  
Author(s):  
Hai Jiang ◽  
William Stout

One emphasis in the development and evaluation of SIBTEST has been the control of Type I error (false flagging of non-differential item functioning [DIF] items) inflation and estimation bias. SIBTEST has performed well in comparative simulation studies of Type I error and estimation bias relative to other procedures such as the Mantel-Haenszel and Logistic Regression. Nevertheless it has for a minority of cases that might occur in applications displayed sizable Type I error inflation and estimation bias. A vital part of SIBTEST is the regression correction, which adjusts for the Type I error-inflating and estimation-biasing influence of group target ability differences by using the linear regression of true on observed matching subtest scores from Classical Test Theory. In this paper, we propose a new regression correction, using essentially a two-segment piecewise linear regression of the true on observed matching subtest scores. A realistic simulation study of the new approach shows that when there is a clear group ability distributional difference, the new approach displays improved SIBTEST Type I error performance; when there is no group ability distributional difference, its Type I error rate is comparable to the current SIBTEST. We have also conducted a power study which indicates that the new approach has on average similar power as the current SIBTEST. We concluded that the new version of SIBTEST, although not perfectly robust, seems appropriately robust against sizable Type I error inflation, while retaining other desirable features of the current version.


2015 ◽  
Vol 23 (88) ◽  
pp. 593-610
Author(s):  
Patrícia Costa ◽  
Maria Eugénia Ferrão

This study aims to provide statistical evidence of the complementarity between classical test theory and item response models for certain educational assessment purposes. Such complementarity might support, at a reduced cost, future development of innovative procedures for item calibration in adaptive testing. Classical test theory and the generalized partial credit model are applied to tests comprising multiple choice, short answer, completion, and open response items scored partially. Datasets are derived from the tests administered to the Portuguese population of students enrolled in the 4th and 6th grades. The results show a very strong association between the estimates of difficulty obtained from classical test theory and item response models, corroborating the statistical theory of mental testing.


Sign in / Sign up

Export Citation Format

Share Document