Robust Maximum Marginal Likelihood (RMML) Estimation for Item Response Theory Models

Mapping Intimacies ◽

10.31234/osf.io/v6us8 ◽

2018 ◽

Author(s):

Maxwell Hong ◽

Alison Cheng

Keyword(s):

Item Response Theory ◽

Item Response ◽

Robust Estimation ◽

Marginal Likelihood ◽

Estimation Method ◽

Self Report ◽

Item Parameter ◽

Parameter Estimates ◽

Response Theory ◽

Detection Rates

Self-report data are common in psychological and survey research. Unfortunately, manyof these samples are plagued with careless responses due to unmotivated participants. Thepurpose of this study is to propose and evaluate a robust estimation method in order to detectcareless, or unmotivated, responders while leveraging Item Response Theory (IRT) person fitstatistics. First, we outline a general framework for robust estimation specific for IRT models.Subsequently, we conduct a simulation study covering multiple conditions to evaluate theperformance of the proposed method. Ultimately, we show how robust maximum marginallikelihood (RMML) estimation significantly improves detection rates for careless responders andreduce bias in item parameters across conditions. Furthermore, we apply our method to a realdataset to illustrate the utility of the proposed method. Our findings suggest that robustestimation coupled with person fit statistics offers a powerful procedure to identify carelessrespondents for further review, and to provide more accurate item parameter estimates inpresence of careless responses.

Download Full-text

A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items

Applied Psychological Measurement ◽

10.1177/0146621611428447 ◽

2011 ◽

Vol 35 (8) ◽

pp. 604-622 ◽

Cited By ~ 16

Author(s):

Hirotaka Fukuhara ◽

Akihito Kamata

Keyword(s):

Item Response Theory ◽

Differential Item Functioning ◽

Item Response ◽

Estimation Method ◽

Multidimensional Item Response Theory ◽

Multidimensional Item Response ◽

Response Theory ◽

Data Set ◽

Detection Rates ◽

Item Functioning

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into account, thus estimating DIF magnitude appropriately when a test is composed of testlets. A fully Bayesian estimation method was adopted for parameter estimation. The recovery of parameters was evaluated for the proposed DIF model. Simulation results revealed that the proposed bifactor MIRT DIF model produced better estimates of DIF magnitude and higher DIF detection rates than the traditional IRT DIF model for all simulation conditions. A real data analysis was also conducted by applying the proposed DIF model to a statewide reading assessment data set.

Download Full-text

Sample Size and Test Length for Item Parameter Estimate and Exam Parameter Estimate

Al-Khwarizmi Jurnal Pendidikan Matematika dan Ilmu Pengetahuan Alam ◽

10.24256/jpmipa.v9i1.2384 ◽

2021 ◽

Vol 9 (1) ◽

pp. 69-78

Author(s):

Riswan Riswan

Keyword(s):

Item Response Theory ◽

Sample Size ◽

Item Response ◽

Parameter Estimate ◽

Test Theory ◽

Item Parameter ◽

Parameter Estimates ◽

Test Length ◽

Response Theory ◽

The Stability

The Item Response Theory (IRT) model contains one or more parameters in the model. These parameters are unknown, so it is necessary to predict them. This paper aims (1) to determine the sample size (N) on the stability of the item parameter (2) to determine the length (n) test on the stability of the estimate parameter examinee (3) to determine the effect of the model on the stability of the item and the parameter to examine (4) to find out Effect of sample size and test length on item stability and examinee parameter estimates (5) Effect of sample size, test length, and model on item stability and examinee parameter estimates. This paper is a simulation study in which the latent trait (q) sample simulation is derived from a standard normal population of ~ N (0.1), with a specific Sample Size (N) and test length (n) with the 1PL, 2PL and 3PL models using Wingen. Item analysis was carried out using the classical theory test approach and modern test theory. Item Response Theory and data were analyzed through software R with the ltm package. The results showed that the larger the sample size (N), the more stable the estimated parameter. For the length test, which is the greater the test length (n), the more stable the estimated parameter (q).

Download Full-text

Applications of the Analytically Derived Asymptotic Standard Errors of Item Response Theory Item Parameter Estimates

Journal of Educational Measurement ◽

10.1111/j.1745-3984.2004.tb01109.x ◽

2004 ◽

Vol 41 (2) ◽

pp. 85-117 ◽

Cited By ~ 11

Author(s):

Yuan H. Li ◽

Robert W. Lissitz

Keyword(s):

Item Response Theory ◽

Item Response ◽

Item Parameter ◽

Standard Errors ◽

Parameter Estimates ◽

Response Theory ◽

Asymptotic Standard Errors ◽

Item Parameter Estimates

Download Full-text

Use of Restricted Item Response Theory Models for Examining the Stability of Item Parameter Estimates Over Time

Applied Measurement in Education ◽

10.1207/s15324818ame0402_3 ◽

1991 ◽

Vol 4 (2) ◽

pp. 125-141 ◽

Cited By ~ 9

Author(s):

Clement A. Stone ◽

Suzanne Lane

Keyword(s):

Item Response Theory ◽

Item Response ◽

Item Parameter ◽

Parameter Estimates ◽

Response Theory ◽

Item Parameter Estimates ◽

The Stability ◽

Item Response Theory Models ◽

Over Time

Download Full-text

Parameter Estimation Accuracy of the Effort-Moderated Item Response Theory Model Under Multiple Assumption Violations

Educational and Psychological Measurement ◽

10.1177/0013164420949896 ◽

2020 ◽

pp. 001316442094989

Author(s):

Joseph A. Rios ◽

James Soland

Keyword(s):

Parameter Estimation ◽

Item Response Theory ◽

Item Response ◽

Item Parameter ◽

Estimation Accuracy ◽

Parameter Estimates ◽

Response Theory ◽

Irt Model ◽

Ability Estimates ◽

Ability Parameter

As low-stakes testing contexts increase, low test-taking effort may serve as a serious validity threat. One common solution to this problem is to identify noneffortful responses and treat them as missing during parameter estimation via the effort-moderated item response theory (EM-IRT) model. Although this model has been shown to outperform traditional IRT models (e.g., two-parameter logistic [2PL]) in parameter estimation under simulated conditions, prior research has failed to examine its performance under violations to the model’s assumptions. Therefore, the objective of this simulation study was to examine item and mean ability parameter recovery when violating the assumptions that noneffortful responding occurs randomly (Assumption 1) and is unrelated to the underlying ability of examinees (Assumption 2). Results demonstrated that, across conditions, the EM-IRT model provided robust item parameter estimates to violations of Assumption 1. However, bias values greater than 0.20 SDs were observed for the EM-IRT model when violating Assumption 2; nonetheless, these values were still lower than the 2PL model. In terms of mean ability estimates, model results indicated equal performance between the EM-IRT and 2PL models across conditions. Across both models, mean ability estimates were found to be biased by more than 0.25 SDs when violating Assumption 2. However, our accompanying empirical study suggested that this biasing occurred under extreme conditions that may not be present in some operational settings. Overall, these results suggest that the EM-IRT model provides superior item and equal mean ability parameter estimates in the presence of model violations under realistic conditions when compared with the 2PL model.

Download Full-text

IRT and MIRT Models for Item Parameter Estimation With Multidimensional Multistage Tests

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998619881790 ◽

2019 ◽

Vol 45 (4) ◽

pp. 383-402

Author(s):

Paul A. Jewsbury ◽

Peter W. van Rijn

Keyword(s):

Item Response Theory ◽

Item Response ◽

Latent Variable ◽

Large Scale ◽

Real Data ◽

Item Parameter ◽

Practical Reasons ◽

Parameter Estimates ◽

Response Theory ◽

Item Parameter Estimates

In large-scale educational assessment data consistent with a simple-structure multidimensional item response theory (MIRT) model, where every item measures only one latent variable, separate unidimensional item response theory (UIRT) models for each latent variable are often calibrated for practical reasons. While this approach can be valid for data from a linear test, unacceptable item parameter estimates are obtained when data arise from a multistage test (MST). We explore this situation from a missing data perspective and show mathematically that MST data will be problematic for calibrating multiple UIRT models but not MIRT models. This occurs because some items that were used in the routing decision are excluded from the separate UIRT models, due to measuring a different latent variable. Both simulated and real data from the National Assessment of Educational Progress are used to further confirm and explore the unacceptable item parameter estimates. The theoretical and empirical results confirm that only MIRT models are valid for item calibration of multidimensional MST data.

Download Full-text

Chapter 4: Item Response Theory Scale Linking in NAEP

Journal of Educational Statistics ◽

10.3102/10769986017002155 ◽

1992 ◽

Vol 17 (2) ◽

pp. 155-173 ◽

Cited By ~ 4

Author(s):

Kentaro Yamamoto ◽

John Mazzeo

Keyword(s):

Item Response Theory ◽

Item Response ◽

Mathematics Assessment ◽

Item Parameter ◽

Parameter Estimates ◽

Response Theory ◽

Item Parameter Estimates ◽

Common Scale ◽

Educational Assessments ◽

Scale Linking

In educational assessments, it is often necessary to compare the performance of groups of individuals who have been administered different forms of a test. If these groups are to be validly compared, all results need to be expressed on a common scale. When assessment results are to be reported using an item response theory (IRT) proficiency metric, as is done for the National Assessment of Educational Progress (NAEP), establishing a common metric becomes synonymous with expressing IRT item parameter estimates on a common scale. Procedures that accomplish this are referred to here as scale linking procedures. This chapter discusses the need for scale linking in NAEP and illustrates the specific procedures used to carry out the linking in the context of the major analyses conducted for the 1990 NAEP mathematics assessment.

Download Full-text

Summed Score Likelihood–Based Indices for Testing Latent Variable Distribution Fit in Item Response Theory

Educational and Psychological Measurement ◽

10.1177/0013164417717024 ◽

2017 ◽

Vol 78 (5) ◽

pp. 857-886 ◽

Cited By ~ 4

Author(s):

Zhen Li ◽

Li Cai

Keyword(s):

Item Response Theory ◽

Item Response ◽

Latent Variable ◽

Item Parameter ◽

Parameter Estimates ◽

Test Statistics ◽

Response Theory ◽

Patient Reported ◽

Variable Distribution ◽

Distribution Fit

In standard item response theory (IRT) applications, the latent variable is typically assumed to be normally distributed. If the normality assumption is violated, the item parameter estimates can become biased. Summed score likelihood–based statistics may be useful for testing latent variable distribution fit. We develop Satorra–Bentler type moment adjustments to approximate the test statistics’ tail-area probability. A simulation study was conducted to examine the calibration and power of the unadjusted and adjusted statistics in various simulation conditions. Results show that the proposed indices have tail-area probabilities that can be closely approximated by central chi-squared random variables under the null hypothesis. Furthermore, the test statistics are focused. They are powerful for detecting latent variable distributional assumption violations, and not sensitive (correctly) to other forms of model misspecification such as multidimensionality. As a comparison, the goodness-of-fit statistic M2 has considerably lower power against latent variable nonnormality than the proposed indices. Empirical data from a patient-reported health outcomes study are used as illustration.

Download Full-text

The Improved Genetic Algorithms Apply on Parameter Estimation of Two Parameters Logistic Model on Item Response Theory

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.2620 ◽

2013 ◽

Vol 756-759 ◽

pp. 2620-2624 ◽

Cited By ~ 1

Author(s):

Peng Dong Du ◽

Yan Hua Chu

Keyword(s):

Genetic Algorithm ◽

Parameter Estimation ◽

Item Response Theory ◽

Item Response ◽

Estimation Method ◽

Optimal Solution ◽

Item Parameter ◽

Response Theory ◽

Improvement Strategy ◽

Parameter Estimation Method

Based on the item response theory 2PLM parameter estimation method and genetic algorithm in Detailed exploration, put forward a kind of 2PLM based on genetic algorithm parameter estimation method, and the corresponding algorithm program for different item parameter estimation. On the basis of genetic coding, genetic analysis and reference, proposed to the operators of genetic improvement strategy and algorithm to accelerate the convergence of strategy, wove algorithm verification procedures and foreign popular BILOG software were compared, the results showed that, in a certain range of error, the proposed algorithm can converge to the optimal solution.

Download Full-text