Summed Score Likelihood–Based Indices for Testing Latent Variable Distribution Fit in Item Response Theory

In standard item response theory (IRT) applications, the latent variable is typically assumed to be normally distributed. If the normality assumption is violated, the item parameter estimates can become biased. Summed score likelihood–based statistics may be useful for testing latent variable distribution fit. We develop Satorra–Bentler type moment adjustments to approximate the test statistics’ tail-area probability. A simulation study was conducted to examine the calibration and power of the unadjusted and adjusted statistics in various simulation conditions. Results show that the proposed indices have tail-area probabilities that can be closely approximated by central chi-squared random variables under the null hypothesis. Furthermore, the test statistics are focused. They are powerful for detecting latent variable distributional assumption violations, and not sensitive (correctly) to other forms of model misspecification such as multidimensionality. As a comparison, the goodness-of-fit statistic M2 has considerably lower power against latent variable nonnormality than the proposed indices. Empirical data from a patient-reported health outcomes study are used as illustration.

Download Full-text

IRT and MIRT Models for Item Parameter Estimation With Multidimensional Multistage Tests

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998619881790 ◽

2019 ◽

Vol 45 (4) ◽

pp. 383-402

Author(s):

Paul A. Jewsbury ◽

Peter W. van Rijn

Keyword(s):

Item Response Theory ◽

Item Response ◽

Latent Variable ◽

Large Scale ◽

Real Data ◽

Item Parameter ◽

Practical Reasons ◽

Parameter Estimates ◽

Response Theory ◽

Item Parameter Estimates

In large-scale educational assessment data consistent with a simple-structure multidimensional item response theory (MIRT) model, where every item measures only one latent variable, separate unidimensional item response theory (UIRT) models for each latent variable are often calibrated for practical reasons. While this approach can be valid for data from a linear test, unacceptable item parameter estimates are obtained when data arise from a multistage test (MST). We explore this situation from a missing data perspective and show mathematically that MST data will be problematic for calibrating multiple UIRT models but not MIRT models. This occurs because some items that were used in the routing decision are excluded from the separate UIRT models, due to measuring a different latent variable. Both simulated and real data from the National Assessment of Educational Progress are used to further confirm and explore the unacceptable item parameter estimates. The theoretical and empirical results confirm that only MIRT models are valid for item calibration of multidimensional MST data.

Download Full-text

Interval Estimation of Latent Variable Scores in Item Response Theory

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998617732764 ◽

2017 ◽

Vol 43 (3) ◽

pp. 259-285 ◽

Cited By ~ 2

Author(s):

Yang Liu ◽

Ji Seung Yang

Keyword(s):

Item Response Theory ◽

Item Response ◽

Latent Variable ◽

Interval Estimation ◽

Random Variable ◽

Item Parameter ◽

Response Theory ◽

Inference Problem ◽

Irt Model ◽

Item Parameter Estimation

The uncertainty arising from item parameter estimation is often not negligible and must be accounted for when calculating latent variable (LV) scores in item response theory (IRT). It is particularly so when the calibration sample size is limited and/or the calibration IRT model is complex. In the current work, we treat two-stage IRT scoring as a predictive inference problem: The target of prediction is a random variable that follows the true posterior of the LV conditional on the response pattern being scored. Various Bayesian, fiducial, and frequentist prediction intervals of LV scores, which can be obtained from a simple yet generic Monte Carlo recipe, are evaluated and contrasted via simulations based on several measures of prediction quality. An empirical data example is also presented to illustrate the use of candidate methods.

Download Full-text

Sample Size and Test Length for Item Parameter Estimate and Exam Parameter Estimate

Al-Khwarizmi Jurnal Pendidikan Matematika dan Ilmu Pengetahuan Alam ◽

10.24256/jpmipa.v9i1.2384 ◽

2021 ◽

Vol 9 (1) ◽

pp. 69-78

Author(s):

Riswan Riswan

Keyword(s):

Item Response Theory ◽

Sample Size ◽

Item Response ◽

Parameter Estimate ◽

Test Theory ◽

Item Parameter ◽

Parameter Estimates ◽

Test Length ◽

Response Theory ◽

The Stability

The Item Response Theory (IRT) model contains one or more parameters in the model. These parameters are unknown, so it is necessary to predict them. This paper aims (1) to determine the sample size (N) on the stability of the item parameter (2) to determine the length (n) test on the stability of the estimate parameter examinee (3) to determine the effect of the model on the stability of the item and the parameter to examine (4) to find out Effect of sample size and test length on item stability and examinee parameter estimates (5) Effect of sample size, test length, and model on item stability and examinee parameter estimates. This paper is a simulation study in which the latent trait (q) sample simulation is derived from a standard normal population of ~ N (0.1), with a specific Sample Size (N) and test length (n) with the 1PL, 2PL and 3PL models using Wingen. Item analysis was carried out using the classical theory test approach and modern test theory. Item Response Theory and data were analyzed through software R with the ltm package. The results showed that the larger the sample size (N), the more stable the estimated parameter. For the length test, which is the greater the test length (n), the more stable the estimated parameter (q).

Download Full-text

Testing Latent Variable Distribution Fit in IRT Using Posterior Residuals

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998620953764 ◽

2020 ◽

pp. 107699862095376

Author(s):

Scott Monroe

Keyword(s):

Latent Variable ◽

Latent Trait ◽

Item Parameter ◽

Parameter Estimates ◽

Fit Statistics ◽

Item Fit ◽

Sample Average ◽

Variable Distribution ◽

Standard Normal ◽

Distribution Fit

This research proposes a new statistic for testing latent variable distribution fit for unidimensional item response theory (IRT) models. If the typical assumption of normality is violated, then item parameter estimates will be biased, and dependent quantities such as IRT score estimates will be adversely affected. The proposed statistic compares the specified latent variable distribution to the sample average of latent variable posterior distributions commonly used in IRT scoring. Formally, the statistic is an instantiation of a generalized residual and is thus asymptotically distributed as standard normal. Also, the statistic naturally complements residual-based item-fit statistics, as both are conditional on the latent trait, and can be presented with graphical plots. In addition, a corresponding unconditional statistic, which controls for multiple comparisons, is proposed. The statistics are evaluated using a simulation study, and empirical analyses are provided.

Download Full-text

Applications of the Analytically Derived Asymptotic Standard Errors of Item Response Theory Item Parameter Estimates

Journal of Educational Measurement ◽

10.1111/j.1745-3984.2004.tb01109.x ◽

2004 ◽

Vol 41 (2) ◽

pp. 85-117 ◽

Cited By ~ 11

Author(s):

Yuan H. Li ◽

Robert W. Lissitz

Keyword(s):

Item Response Theory ◽

Item Response ◽

Item Parameter ◽

Standard Errors ◽

Parameter Estimates ◽

Response Theory ◽

Asymptotic Standard Errors ◽

Item Parameter Estimates

Download Full-text

Use of Restricted Item Response Theory Models for Examining the Stability of Item Parameter Estimates Over Time

Applied Measurement in Education ◽

10.1207/s15324818ame0402_3 ◽

1991 ◽

Vol 4 (2) ◽

pp. 125-141 ◽

Cited By ~ 9

Author(s):

Clement A. Stone ◽

Suzanne Lane

Keyword(s):

Item Response Theory ◽

Item Response ◽

Item Parameter ◽

Parameter Estimates ◽

Response Theory ◽

Item Parameter Estimates ◽

The Stability ◽

Item Response Theory Models ◽

Over Time

Download Full-text

Parameter Estimation Accuracy of the Effort-Moderated Item Response Theory Model Under Multiple Assumption Violations

Educational and Psychological Measurement ◽

10.1177/0013164420949896 ◽

2020 ◽

pp. 001316442094989

Author(s):

Joseph A. Rios ◽

James Soland

Keyword(s):

Parameter Estimation ◽

Item Response Theory ◽

Item Response ◽

Item Parameter ◽

Estimation Accuracy ◽

Parameter Estimates ◽

Response Theory ◽

Irt Model ◽

Ability Estimates ◽

Ability Parameter

As low-stakes testing contexts increase, low test-taking effort may serve as a serious validity threat. One common solution to this problem is to identify noneffortful responses and treat them as missing during parameter estimation via the effort-moderated item response theory (EM-IRT) model. Although this model has been shown to outperform traditional IRT models (e.g., two-parameter logistic [2PL]) in parameter estimation under simulated conditions, prior research has failed to examine its performance under violations to the model’s assumptions. Therefore, the objective of this simulation study was to examine item and mean ability parameter recovery when violating the assumptions that noneffortful responding occurs randomly (Assumption 1) and is unrelated to the underlying ability of examinees (Assumption 2). Results demonstrated that, across conditions, the EM-IRT model provided robust item parameter estimates to violations of Assumption 1. However, bias values greater than 0.20 SDs were observed for the EM-IRT model when violating Assumption 2; nonetheless, these values were still lower than the 2PL model. In terms of mean ability estimates, model results indicated equal performance between the EM-IRT and 2PL models across conditions. Across both models, mean ability estimates were found to be biased by more than 0.25 SDs when violating Assumption 2. However, our accompanying empirical study suggested that this biasing occurred under extreme conditions that may not be present in some operational settings. Overall, these results suggest that the EM-IRT model provides superior item and equal mean ability parameter estimates in the presence of model violations under realistic conditions when compared with the 2PL model.

Download Full-text

Chapter 4: Item Response Theory Scale Linking in NAEP

Journal of Educational Statistics ◽

10.3102/10769986017002155 ◽

1992 ◽

Vol 17 (2) ◽

pp. 155-173 ◽

Cited By ~ 4

Author(s):

Kentaro Yamamoto ◽

John Mazzeo

Keyword(s):

Item Response Theory ◽

Item Response ◽

Mathematics Assessment ◽

Item Parameter ◽

Parameter Estimates ◽

Response Theory ◽

Item Parameter Estimates ◽

Common Scale ◽

Educational Assessments ◽

Scale Linking

In educational assessments, it is often necessary to compare the performance of groups of individuals who have been administered different forms of a test. If these groups are to be validly compared, all results need to be expressed on a common scale. When assessment results are to be reported using an item response theory (IRT) proficiency metric, as is done for the National Assessment of Educational Progress (NAEP), establishing a common metric becomes synonymous with expressing IRT item parameter estimates on a common scale. Procedures that accomplish this are referred to here as scale linking procedures. This chapter discusses the need for scale linking in NAEP and illustrates the specific procedures used to carry out the linking in the context of the major analyses conducted for the 1990 NAEP mathematics assessment.

Download Full-text

Robust Maximum Marginal Likelihood (RMML) Estimation for Item Response Theory Models

10.31234/osf.io/v6us8 ◽

2018 ◽

Author(s):

Maxwell Hong ◽

Alison Cheng

Keyword(s):

Item Response Theory ◽

Item Response ◽

Robust Estimation ◽

Marginal Likelihood ◽

Estimation Method ◽

Self Report ◽

Item Parameter ◽

Parameter Estimates ◽

Response Theory ◽

Detection Rates

Self-report data are common in psychological and survey research. Unfortunately, manyof these samples are plagued with careless responses due to unmotivated participants. Thepurpose of this study is to propose and evaluate a robust estimation method in order to detectcareless, or unmotivated, responders while leveraging Item Response Theory (IRT) person fitstatistics. First, we outline a general framework for robust estimation specific for IRT models.Subsequently, we conduct a simulation study covering multiple conditions to evaluate theperformance of the proposed method. Ultimately, we show how robust maximum marginallikelihood (RMML) estimation significantly improves detection rates for careless responders andreduce bias in item parameters across conditions. Furthermore, we apply our method to a realdataset to illustrate the utility of the proposed method. Our findings suggest that robustestimation coupled with person fit statistics offers a powerful procedure to identify carelessrespondents for further review, and to provide more accurate item parameter estimates inpresence of careless responses.

Download Full-text

Using Item Response Theory to Identify Responders to Treatment: Examples with the Patient-Reported Outcomes Measurement Information System (PROMIS®) Physical Function Scale and Emotional Distress Composite

Psychometrika ◽

10.1007/s11336-021-09774-1 ◽

2021 ◽

Author(s):

Ron D. Hays ◽

Karen L. Spritzer ◽

Steven P. Reise

Keyword(s):

Item Response Theory ◽

Physical Function ◽

Item Response ◽

Emotional Distress ◽

Classical Test Theory ◽

Test Theory ◽

Measurement Information ◽

Response Theory ◽

Reliable Change ◽

Patient Reported

AbstractThe reliable change index has been used to evaluate the significance of individual change in health-related quality of life. We estimate reliable change for two measures (physical function and emotional distress) in the Patient-Reported Outcomes Measurement Information System (PROMIS®) 29-item health-related quality of life measure (PROMIS-29 v2.1). Using two waves of data collected 3 months apart in a longitudinal observational study of chronic low back pain and chronic neck pain patients receiving chiropractic care, and simulations, we compare estimates of reliable change from classical test theory fixed standard errors with item response theory standard errors from the graded response model. We find that unless true change in the PROMIS physical function and emotional distress scales is substantial, classical test theory estimates of significant individual change are much more optimistic than estimates of change based on item response theory.

Download Full-text