Assessing the Sensitivity of Weighted Least Squares Model Fit Indexes to Local Dependence in Item Response Theory Models

2016 ◽  
Vol 24 (3) ◽  
pp. 331-340 ◽  
Author(s):  
Anne Corinne Huggins-Manley ◽  
HyunSuk Han
Author(s):  
Dagmar Amtmann ◽  
Alyssa Bamer ◽  
Kara McMullen ◽  
Colleen M Ryan ◽  
Jeffrey C Schneider ◽  
...  

Abstract Objective The Burn Specific Health Scale-Brief (BSHS-B) is a commonly used burn specific health outcome measure that includes 40 items across nine subscales. The objective of this study was to use both classical and modern psychometric methods to evaluate psychometric properties of the BSHS-B. Methods Data were collected post burn injury by a multisite federally funded project tracking long term outcomes. We examined dimensionality, local dependence, item fit, and functioning of response categories, homogeneity, and floor and ceiling effects. Items were fit to Item Response Theory models for evaluation. Results A total of 653 adults with burn injury completed the BSHS-B. Factor analyses supported unidimensionality for all subscales, but not for a total score based on all 40 items. All nine of the subscales had significant ceiling effects. Six item pairs displayed local dependence suggesting redundance and 11 items did not fit the Item Response Theory models. At least 15 items have too many response options. Conclusions Results identified numerous psychometric issues with the BSHS-B. A single summary score should never be used for any purpose. Psychometric properties of the scale need to be improved by removing redundant items, reducing response categories and modifying or deleting problematic items. Additional conceptual work is needed to, at a minimum, revise the work subscale and optimally to revisit and clearly define the constructs measured by all the subscales. Additional items are needed to address ceiling effects.


2014 ◽  
Vol 22 (2) ◽  
pp. 323-341 ◽  
Author(s):  
Dheeraj Raju ◽  
Xiaogang Su ◽  
Patricia A. Patrician

Background and Purpose: The purpose of this article is to introduce different types of item response theory models and to demonstrate their usefulness by evaluating the Practice Environment Scale. Methods: Item response theory models such as constrained and unconstrained graded response model, partial credit model, Rasch model, and one-parameter logistic model are demonstrated. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) indices are used as model selection criterion. Results: The unconstrained graded response and partial credit models indicated the best fit for the data. Almost all items in the instrument performed well. Conclusions: Although most of the items strongly measure the construct, there are a few items that could be eliminated without substantially altering the instrument. The analysis revealed that the instrument may function differently when administered to different unit types.


2017 ◽  
Vol 6 (4) ◽  
pp. 113
Author(s):  
Esin Yilmaz Kogar ◽  
Hülya Kelecioglu

The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and sample size change, and then to compare the obtained results. Mathematic test in PISA 2012 was employed as the data collection tool, and 36 items were used to constitute six different data sets containing different numbers of testlets and independent items. Subsequently, from these constituted data sets, three different sample sizes of 250, 500 and 1000 persons were selected randomly. When the findings of the research were examined, it was determined that, generally the lowest mean error values were those obtained from UIRT, and TRT yielded a mean of error estimation lower than that of BIF. It was found that, under all conditions, models which take into consideration the local dependency have provided a better model-data compatibility than UIRT, generally there is no meaningful difference between BIF and TRT, and both models can be used for those data sets. It can be said that when there is a meaningful difference between those two models, generally BIF yields a better result. In addition, it has been determined that, in each sample size and data set, item and ability parameters and correlations of errors of the parameters are generally high.


Sign in / Sign up

Export Citation Format

Share Document