item parameter
Recently Published Documents


TOTAL DOCUMENTS

193
(FIVE YEARS 57)

H-INDEX

23
(FIVE YEARS 2)

Author(s):  
Riswan Riswan

The Item Response Theory (IRT) model contains one or more parameters in the model. These parameters are unknown, so it is necessary to predict them. This paper aims (1) to determine the sample size (N) on the stability of the item parameter (2) to determine the length (n) test on the stability of the estimate parameter examinee (3) to determine the effect of the model on the stability of the item and the parameter to examine (4) to find out Effect of sample size and test length on item stability and examinee parameter estimates (5) Effect of sample size, test length, and model on item stability and examinee parameter estimates. This paper is a simulation study in which the latent trait (q) sample simulation is derived from a standard normal population of ~ N (0.1), with a specific Sample Size (N) and test length (n) with the 1PL, 2PL and 3PL models using Wingen. Item analysis was carried out using the classical theory test approach and modern test theory. Item Response Theory and data were analyzed through software R with the ltm package. The results showed that the larger the sample size (N), the more stable the estimated parameter. For the length test, which is the greater the test length (n), the more stable the estimated parameter (q).


2021 ◽  
Author(s):  
Angély Loubert ◽  
Antoine Regnault ◽  
Véronique Sébille ◽  
Jean-Benoit Hardouin

Abstract BackgroundIn the analysis of clinical trial endpoints, calibration of patient-reported outcomes (PRO) instruments ensures that resulting “scores” represent the same quantity of the measured concept between applications. Rasch measurement theory (RMT) is a psychometric approach that guarantees algebraic separation of person and item parameter estimates, allowing formal calibration of PRO instruments. In the RMT framework, calibration is performed using the item parameter estimates obtained from a previous “calibration” study. But if calibration is based on poorly estimated item parameters (e.g., because the sample size of the calibration sample was low), this may hamper the ability to detect a treatment effect, and direct estimation of item parameters from the trial data (non-calibration) may then be preferred. The objective of this simulation study was to assess the impact of calibration on the comparison of PRO results between treatment groups, using different analysis methods.MethodsPRO results were simulated following a polytomous Rasch model, for a calibration and a trial sample. Scenarios included varying sample sizes, with instrument of varying number of items and modalities, and varying item parameters distributions. Different treatment effect sizes and distributions of the two patient samples were also explored. Comparison of treatment groups was performed using different methods based on a random effect Rasch model. Calibrated and non-calibrated approaches were compared based on type-I error, power, bias, and variance of the estimates for the difference between groups.Results There was no impact of the calibration approach on type-I error, power, bias, and dispersion of the estimates. Among other findings, mistargeting between the PRO instrument and patients from the trial sample (regarding the level of measured concept) resulted in a lower power and higher position bias than appropriate targeting. ConclusionsCalibration of PROs in clinical trials does not compromise the ability to accurately assess a treatment effect and is essential to properly interpret PRO results. Given its important added value, calibration should thus always be performed when a PRO instrument is used as an endpoint in a clinical trial, in the RMT framework.


2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. 12-13
Author(s):  
Hao Luo ◽  
Björn Andersson ◽  
Gloria H Y Wong ◽  
Terry Y S Lum

Abstract Background The Montreal Cognitive Assessment (MoCA) has started to be widely used in longitudinal investigations to measure changes in cognition. However, the longitudinal measurement properties of MoCA have not been investigated. We aimed to examine the measurement invariance of individual MoCA items across four time points. Methods We used longitudinal data collected between 2014 and 2017 from a cohort study on health and well-being of older adults in Hong Kong. The Cantonese version of the MoCA was used. We applied multiple group confirmatory factor analysis of ordinal variables to examine measurement invariance by educational level and across time points. Invariant items were identified by sequential model comparisons. Results We included 1029 participants that answered MoCA items across all time points. We found that items Cube, Clock Hand and Clock Number had significantly different item parameters between participants with and without formal education at all time points. The selected model (RMSEA=0.031; SRMR=0.064) indicated that eight items (Trail, Cube, Clock Shape, Clock Number, Clock Hand, Abstraction, Short-term Memory, and Orientation) did not exhibit measurement invariance over time. However, the differences in item parameter estimate over time were marginal. Accounting for the lack of measurement invariance did not substantially affect classification properties based on cutoff values at the 2nd ( major neurocognitive disorder) and 7th (mild cognitive impairment) percentile. Conclusion Our findings support using MoCA to assess changes in cognition over time in the study population. Future research should examine the longitudinal measurement properties of the test in other populations with different characteristics.


Stats ◽  
2021 ◽  
Vol 4 (4) ◽  
pp. 814-836
Author(s):  
Alexander Robitzsch

The Rasch model is one of the most prominent item response models. In this article, different item parameter estimation methods for the Rasch model are systematically compared through a comprehensive simulation study: Different alternatives of joint maximum likelihood (JML) estimation, different alternatives of marginal maximum likelihood (MML) estimation, conditional maximum likelihood (CML) estimation, and several limited information methods (LIM). The type of ability distribution (i.e., nonnormality), the number of items, sample size, and the distribution of item difficulties were systematically varied. Across different simulation conditions, MML methods with flexible distributional specifications can be at least as efficient as CML. Moreover, in many situations (i.e., for long tests), penalized JML and JML with ε adjustment resulted in very efficient estimates and might be considered alternatives to JML implementations currently used in statistical software. Moreover, minimum chi-square (MINCHI) estimation was the best-performing LIM method. These findings demonstrate that JML estimation and LIM can still prove helpful in applied research.


2021 ◽  
Vol 6 ◽  
Author(s):  
Shenghai Dai ◽  
Thao Thu Vo ◽  
Olasunkanmi James Kehinde ◽  
Haixia He ◽  
Yu Xue ◽  
...  

The implementation of polytomous item response theory (IRT) models such as the graded response model (GRM) and the generalized partial credit model (GPCM) to inform instrument design and validation has been increasing across social and educational contexts where rating scales are usually used. The performance of such models has not been fully investigated and compared across conditions with common survey-specific characteristics such as short test length, small sample size, and data missingness. The purpose of the current simulation study is to inform the literature and guide the implementation of GRM and GPCM under these conditions. For item parameter estimations, results suggest a sample size of at least 300 and/or an instrument length of at least five items for both models. The performance of GPCM is stable across instrument lengths while that of GRM improves notably as the instrument length increases. For person parameters, GRM reveals more accurate estimates when the proportion of missing data is small, whereas GPCM is favored in the presence of a large amount of missingness. Further, it is not recommended to compare GRM and GPCM based on test information. Relative model fit indices (AIC, BIC, LL) might not be powerful when the sample size is less than 300 and the length is less than 5. Synthesis of the patterns of the results, as well as recommendations for the implementation of polytomous IRT models, are presented and discussed.


2021 ◽  
Vol 23 (4) ◽  
pp. 1509-1516
Author(s):  
Taeyoung Kim ◽  
Seungbae Choi ◽  
Hae-Gyung Yoon

2021 ◽  
Vol 12 ◽  
Author(s):  
Wenyi Wang ◽  
Yukun Tu ◽  
Lihong Song ◽  
Juanjuan Zheng ◽  
Teng Wang

The implementation of cognitive diagnostic computerized adaptive testing often depends on a high-quality item bank. How to online estimate the item parameters and calibrate the Q-matrix required by items becomes an important problem in the construction of the high-quality item bank for personalized adaptive learning. The related previous research mainly focused on the calibration method with the random design in which the new items were randomly assigned to examinees. Although the way of randomly assigning new items can ensure the randomness of data sampling, some examinees cannot provide enough information about item parameter estimation or Q-matrix calibration for the new items. In order to increase design efficiency, we investigated three adaptive designs under different practical situations: (a) because the non-parametric classification method needs calibrated item attribute vectors, but not item parameters, the first study focused on an optimal design for the calibration of the Q-matrix of the new items based on Shannon entropy; (b) if the Q-matrix of the new items was specified by subject experts, an optimal design was designed for the estimation of item parameters based on Fisher information; and (c) if the Q-matrix and item parameters are unknown for the new items, we developed a hybrid optimal design for simultaneously estimating them. The simulation results showed that, the adaptive designs are better than the random design with a limited number of examinees in terms of the correct recovery rate of attribute vectors and the precision of item parameters.


2021 ◽  
pp. 001316442110339
Author(s):  
Allison W. Cooperman ◽  
David J. Weiss ◽  
Chun Wang

Adaptive measurement of change (AMC) is a psychometric method for measuring intra-individual change on one or more latent traits across testing occasions. Three hypothesis tests—a Z test, likelihood ratio test, and score ratio index—have demonstrated desirable statistical properties in this context, including low false positive rates and high true positive rates. However, the extant AMC research has assumed that the item parameter values in the simulated item banks were devoid of estimation error. This assumption is unrealistic for applied testing settings, where item parameters are estimated from a calibration sample before test administration. Using Monte Carlo simulation, this study evaluated the robustness of the common AMC hypothesis tests to the presence of item parameter estimation error when measuring omnibus change across four testing occasions. Results indicated that item parameter estimation error had at most a small effect on false positive rates and latent trait change recovery, and these effects were largely explained by the computerized adaptive testing item bank information functions. Differences in AMC performance as a function of item parameter estimation error and choice of hypothesis test were generally limited to simulees with particularly low or high latent trait values, where the item bank provided relatively lower information. These simulations highlight how AMC can accurately measure intra-individual change in the presence of item parameter estimation error when paired with an informative item bank. Limitations and future directions for AMC research are discussed.


Psych ◽  
2021 ◽  
Vol 3 (3) ◽  
pp. 279-307
Author(s):  
Jan Steinfeld ◽  
Alexander Robitzsch

There is some debate in the psychometric literature about item parameter estimation in multistage designs. It is occasionally argued that the conditional maximum likelihood (CML) method is superior to the marginal maximum likelihood method (MML) because no assumptions have to be made about the trait distribution. However, CML estimation in its original formulation leads to biased item parameter estimates. Zwitser and Maris (2015, Psychometrika) proposed a modified conditional maximum likelihood estimation method for multistage designs that provides practically unbiased item parameter estimates. In this article, the differences between different estimation approaches for multistage designs were investigated in a simulation study. Four different estimation conditions (CML, CML estimation with the consideration of the respective MST design, MML with the assumption of a normal distribution, and MML with log-linear smoothing) were examined using a simulation study, considering different multistage designs, number of items, sample size, and trait distributions. The results showed that in the case of the substantial violation of the normal distribution, the CML method seemed to be preferable to MML estimation employing a misspecified normal trait distribution, especially if the number of items and sample size increased. However, MML estimation using log-linear smoothing lea to results that were very similar to the CML method with the consideration of the respective MST design.


Sign in / Sign up

Export Citation Format

Share Document