test length
Recently Published Documents


TOTAL DOCUMENTS

155
(FIVE YEARS 23)

H-INDEX

16
(FIVE YEARS 1)

Author(s):  
Burhanettin Ozdemir ◽  
Selahattin Gelbal

AbstractThe computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to measure the language ability of students and to compare the results of MCAT designs with the outcomes of corresponding paper–pencil tests. For this purpose, items in the English Proficiency Tests (EPT) were used to create a multi-dimensional item pool that consists of 599 items. The performance of the MCAT designs was evaluated and compared based on the reliability coefficients, root means square error (RMSE), test-length, and root means squared difference (RMSD) statistics, respectively. Therefore, 36 different conditions were investigated in total. The results of the post-hoc simulation designs indicate that the MCAT designs with the A-optimality item selection method outperformed MCAT designs with other item selection methods by decreasing the test length and RMSD values without any sacrifice in test reliability. Additionally, the best error variance stopping rule for each MCAT algorithm with A-optimality item selection could be considered as 0.25 with 27.9 average test length and 30 items for the fixed test-length stopping rule for the Bayesian MAP method. Overall, MCAT designs tend to decrease the test length by 60 to 65 percent and provide ability estimations with higher precision compared to the traditional paper–pencil tests with 65 to 75 items. Therefore, it is suggested to use the A-optimality method for item selection and the Bayesian MAP method for ability estimation for the MCAT designs since the MCAT algorithm with these specifications shows better performance than others.


2022 ◽  
Vol 12 (1) ◽  
pp. 168
Author(s):  
Eisa Abdul-Wahhab Al-Tarawnah ◽  
Mariam Al-Qahtani

This study aims to compare the effect of test length on the degree of ability parameter estimation in the two-parameter and three-parameter logistic models, using the Bayesian method of expected prior mode and maximum likelihood. The experimental approach is followed, using the Monte Carlo method of simulation. The study population consists of all subjects with the specified ability level. The study includes random samples of subjects and of items. Results reveal that estimation accuracy of the ability parameter in the two-parameter logistic model according to the maximum likelihood method and the Bayesian method increases with the increase in the number of test items. Results also show that with long and average length tests, the effectiveness is related to the maximum likelihood method and to all conditions of the sample size, whereas in short tests, the Bayesian method of prior mode outperformed in all conditions. Results indicate that the increase of the ability parameter in the three-parameter logistic model increases with the increase of test items number. The Bayesian method outperforms with respect to the accuracy of estimation at all conditions of the sample size, whereas in long tests the maximum likelihood method outperforms at all different conditions.   Received: 17 September 2021 / Accepted: 24 November 2021 / Published: 3 January 2022


Author(s):  
Riswan Riswan

The Item Response Theory (IRT) model contains one or more parameters in the model. These parameters are unknown, so it is necessary to predict them. This paper aims (1) to determine the sample size (N) on the stability of the item parameter (2) to determine the length (n) test on the stability of the estimate parameter examinee (3) to determine the effect of the model on the stability of the item and the parameter to examine (4) to find out Effect of sample size and test length on item stability and examinee parameter estimates (5) Effect of sample size, test length, and model on item stability and examinee parameter estimates. This paper is a simulation study in which the latent trait (q) sample simulation is derived from a standard normal population of ~ N (0.1), with a specific Sample Size (N) and test length (n) with the 1PL, 2PL and 3PL models using Wingen. Item analysis was carried out using the classical theory test approach and modern test theory. Item Response Theory and data were analyzed through software R with the ltm package. The results showed that the larger the sample size (N), the more stable the estimated parameter. For the length test, which is the greater the test length (n), the more stable the estimated parameter (q).


2021 ◽  
Author(s):  
Joseph Rios ◽  
Jiayi Deng

An underlying threat to the validity of reliability measures is the introduction of systematic variance in examinee scores from unintended constructs that differ from those assessed. One construct-irrelevant behavior that has gained increased attention in the literature is rapid guessing (RG), which occurs when examinees answer quickly with intentional disregard for item content. To examine the degree of distortion in coefficient alpha due to RG, this study compared alpha estimates between conditions in which simulees engaged in full solution (i.e., do not engage in RG) versus partial RG behavior. This was done by conducting a simulation study in which the percentage and ability characteristics of rapid responders as well as the percentage and pattern of RG were manipulated. After controlling for test length and difficulty, the average degree of distortion in estimates of coefficient alpha due to RG ranged from -.04 to .02 across 144 conditions. Although slight differences were noted between conditions differing in RG pattern and RG responder ability, the findings from this study suggest that estimates of coefficient alpha are largely robust to the presence of RG due to cognitive fatigue and a low perceived probability of success.


2021 ◽  
pp. 014662162110517
Author(s):  
Joseph A. Rios ◽  
Jiayi Deng

An underlying threat to the validity of reliability measures is the introduction of systematic variance in examinee scores from unintended constructs that differ from those assessed. One construct-irrelevant behavior that has gained increased attention in the literature is rapid guessing (RG), which occurs when examinees answer quickly with intentional disregard for item content. To examine the degree of distortion in coefficient alpha due to RG, this study compared alpha estimates between conditions in which simulees engaged in full solution (i.e., do not engage in RG) versus partial RG behavior. This was done by conducting a simulation study in which the percentage and ability characteristics of rapid responders as well as the percentage and pattern of RG were manipulated. After controlling for test length and difficulty, the average degree of distortion in estimates of coefficient alpha due to RG ranged from −.04 to .02 across 144 conditions. Although slight differences were noted between conditions differing in RG pattern and RG responder ability, the findings from this study suggest that estimates of coefficient alpha are largely robust to the presence of RG due to cognitive fatigue and a low perceived probability of success.


2021 ◽  
Author(s):  
Gáspár Lukács

The Response Time Concealed Information Test (RT-CIT) can reveal that a person recognizes a relevant item (probe, e.g. a murder weapon) among other, irrelevant items (controls), based on slower responses to the probe compared to the controls. The present paper assesses the influence of test length (due to practice, habituation, or fatigue) on two key variables in the RT-CIT: (a) probe-control differences and (b) classification accuracy, through a meta-analysis (using 12 previous experiments), as well as with two new experiments. It is consistently demonstrated that increased test length decreases probe-control differences but increases classification accuracies. The main implication for real-life application is that using altogether at least around 600 trials is optimal for the RT-CIT.


Author(s):  
Aritra Ghosh ◽  
Andrew Lan

Computerized adaptive testing (CAT) refers to a form of tests that are personalized to every student/test taker. CAT methods adaptively select the next most informative question/item for each student given their responses to previous questions, effectively reducing test length. Existing CAT methods use item response theory (IRT) models to relate student ability to their responses to questions and static question selection algorithms designed to reduce the ability estimation error as quickly as possible; therefore, these algorithms cannot improve by learning from large-scale student response data. In this paper, we propose BOBCAT, a Bilevel Optimization-Based framework for CAT to directly learn a data-driven question selection algorithm from training data. BOBCAT is agnostic to the underlying student response model and is computationally efficient during the adaptive testing process. Through extensive experiments on five real-world student response datasets, we show that BOBCAT outperforms existing CAT methods (sometimes significantly) at reducing test length.


2021 ◽  
Vol 11 ◽  
Author(s):  
Sedat Sen ◽  
Allan S. Cohen

Results of a comprehensive simulation study are reported investigating the effects of sample size, test length, number of attributes and base rate of mastery on item parameter recovery and classification accuracy of four DCMs (i.e., C-RUM, DINA, DINO, and LCDMREDUCED). Effects were evaluated using bias and RMSE computed between true (i.e., generating) parameters and estimated parameters. Effects of simulated factors on attribute assignment were also evaluated using the percentage of classification accuracy. More precise estimates of item parameters were obtained with larger sample size and longer test length. Recovery of item parameters decreased as the number of attributes increased from three to five but base rate of mastery had a varying effect on the item recovery. Item parameter and classification accuracy were higher for DINA and DINO models.


Sign in / Sign up

Export Citation Format

Share Document