scholarly journals Practical Significance of Item Misfit in Educational Assessments

2017 ◽  
Vol 41 (5) ◽  
pp. 388-400 ◽  
Author(s):  
Carmen Köhler ◽  
Johannes Hartig

Testing item fit is an important step when calibrating and analyzing item response theory (IRT)-based tests, as model fit is a necessary prerequisite for drawing valid inferences from estimated parameters. In the literature, numerous item fit statistics exist, sometimes resulting in contradictory conclusions regarding which items should be excluded from the test. Recently, researchers argue to shift the focus from statistical item fit analyses to evaluating practical consequences of item misfit. This article introduces a method to quantify potential bias of relationship estimates (e.g., correlation coefficients) due to misfitting items. The potential deviation informs about whether item misfit is practically significant for outcomes of substantial analyses. The method is demonstrated using data from an educational test.

Author(s):  
Kim Proctor

Although group consciousness is an important concept in explaining political behavior, both theoretical guidance on how to measure group consciousness and empirical consensus regarding its operationalization are lacking. This has the potential to lead to both diverging results and inaccurate empirical conclusions, which greatly limits the ability to understand the role that group consciousness plays in politics. Using data from Pew’s 2013 “Survey of LGBT Americans,” this analysis provides a foundation for measuring group consciousness using item response theory (IRT). Through an examination of dimensionality, monotonicity, model fit, and differential item functioning, the results demonstrate that many assumptions about measuring group consciousness have been incorrect. Further, the findings suggest that previous conclusions about subgroup differences may be the result of survey bias, rather than actual between-group differences. Moving forward, scholars of political behavior should use IRT to measure latent constructs.


2005 ◽  
Vol 8 (1) ◽  
pp. 100-110 ◽  
Author(s):  
José Antonio López Pina ◽  
M. Dolores Hidalgo Montesinos

In this paper, the distributional properties and power rates of the Lz, Eci2z, and Eci4z statistics when they are used as item fit statistics were explored. The results were compared to t-transformation of Outfit and Infit mean square. Four sample sizes were selected: 100, 250, 500, and 1000 examinees. The abilities were uniform and normal with mean 0 and standard deviation 1, and uniform and normal with mean –1 and standard deviation 1. The pseudo-guessing parameter was fixed at .25. Two ranges of difficulty parameters were selected: ±1 logits and ±2 logits. Two test lengths were selected: 15 and 30 items. The results showed important differences between the T-infit, T-outfit, Lz, Eci2z, and Eci4z statistics. The T-oufit, T-infit, and Lz statistics showed poor standardization with estimated parameters because their distributional properties were not close to the expected values. However, the Eci2z and Eci4z statistics showed satisfactory standardization on all conditions. Further, the power rates of Eci2z and Eci4z were 5% to 10% higher than the power rates of Lz, T-outfit, and T-infit to detect items that do not fit Rasch model.


Author(s):  
Li Cai ◽  
Seung Won Chung ◽  
Taehun Lee

AbstractThe Tucker–Lewis index (TLI; Tucker & Lewis, 1973), also known as the non-normed fit index (NNFI; Bentler & Bonett, 1980), is one of the numerous incremental fit indices widely used in linear mean and covariance structure modeling, particularly in exploratory factor analysis, tools popular in prevention research. It augments information provided by other indices such as the root-mean-square error of approximation (RMSEA). In this paper, we develop and examine an analogous index for categorical item level data modeled with item response theory (IRT). The proposed Tucker–Lewis index for IRT (TLIRT) is based on Maydeu-Olivares and Joe's (2005) $$M_2$$ M 2 family of limited-information overall model fit statistics. The limited-information fit statistics have significantly better Chi-square approximation and power than traditional full-information Pearson or likelihood ratio statistics under realistic situations. Building on the incremental fit assessment principle, the TLIRT compares the fit of model under consideration along a spectrum of worst to best possible model fit scenarios. We examine the performance of the new index using simulated and empirical data. Results from a simulation study suggest that the new index behaves as theoretically expected, and it can offer additional insights about model fit not available from other sources. In addition, a more stringent cutoff value is perhaps needed than Hu and Bentler's (1999) traditional cutoff criterion with continuous variables. In the empirical data analysis, we use a data set from a measurement development project in support of cigarette smoking cessation research to illustrate the usefulness of the TLIRT. We noticed that had we only utilized the RMSEA index, we could have arrived at qualitatively different conclusions about model fit, depending on the choice of test statistics, an issue to which the TLIRT is relatively more immune.


Author(s):  
Bjarne Schmalbach ◽  
Markus Zenger ◽  
Michalis P. Michaelides ◽  
Karin Schermelleh-Engel ◽  
Andreas Hinz ◽  
...  

Abstract. The common factor model – by far the most widely used model for factor analysis – assumes equal item intercepts across respondents. Due to idiosyncratic ways of understanding and answering items of a questionnaire, this assumption is often violated, leading to an underestimation of model fit. Maydeu-Olivares and Coffman (2006) suggested the introduction of a random intercept into the model to address this concern. The present study applies this method to six established instruments (measuring depression, procrastination, optimism, self-esteem, core self-evaluations, and self-regulation) with ambiguous factor structures, using data from representative general population samples. In testing and comparing three alternative factor models (one-factor model, two-factor model, and one-factor model with a random intercept) and analyzing differential correlational patterns with an external criterion, we empirically demonstrate the random intercept model’s merit, and clarify the factor structure for the above-mentioned questionnaires. In sum, we recommend the random intercept model for cases in which acquiescence is suspected to affect response behavior.


Methodology ◽  
2014 ◽  
Vol 10 (4) ◽  
pp. 138-152 ◽  
Author(s):  
Hsien-Yuan Hsu ◽  
Susan Troncoso Skidmore ◽  
Yan Li ◽  
Bruce Thompson

The purpose of the present paper was to evaluate the effect of constraining near-zero parameter cross-loadings to zero in the measurement component of a structural equation model. A Monte Carlo 3 × 5 × 2 simulation design was conducted (i.e., sample sizes of 200, 600, and 1,000; parameter cross-loadings of 0.07, 0.10, 0.13, 0.16, and 0.19 misspecified to be zero; and parameter path coefficients in the structural model of either 0.50 or 0.70). Results indicated that factor pattern coefficients and factor covariances were overestimated in measurement models when near-zero parameter cross-loadings constrained to zero were higher than 0.13 in the population. Moreover, the path coefficients between factors were misestimated when the near-zero parameter cross-loadings constrained to zero were noteworthy. Our results add to the literature detailing the importance of testing individual model specification decisions, and not simply evaluating omnibus model fit statistics.


EMJ Radiology ◽  
2020 ◽  
Author(s):  
Filippo Pesapane

Radiomics is a science that investigates a large number of features from medical images using data-characterisation algorithms, with the aim to analyse disease characteristics that are indistinguishable to the naked eye. Radiogenomics attempts to establish and examine the relationship between tumour genomic characteristics and their radiologic appearance. Although there is certainly a lot to learn from these relationships, one could ask the question: what is the practical significance of radiogenomic discoveries? This increasing interest in such applications inevitably raises numerous legal and ethical questions. In an environment such as the technology field, which changes quickly and unpredictably, regulations need to be timely in order to be relevant.  In this paper, issues that must be solved to make the future applications of this innovative technology safe and useful are analysed.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 601 ◽  
Author(s):  
Marco Germanotta ◽  
Ilaria Mileti ◽  
Ilaria Conforti ◽  
Zaccaria Del Prete ◽  
Irene Aprile ◽  
...  

The estimation of the body’s center of mass (CoM) trajectory is typically obtained using force platforms, or optoelectronic systems (OS), bounding the assessment inside a laboratory setting. The use of magneto-inertial measurement units (MIMUs) allows for more ecological evaluations, and previous studies proposed methods based on either a single sensor or a sensors’ network. In this study, we compared the accuracy of two methods based on MIMUs. Body CoM was estimated during six postural tasks performed by 15 healthy subjects, using data collected by a single sensor on the pelvis (Strapdown Integration Method, SDI), and seven sensors on the pelvis and lower limbs (Biomechanical Model, BM). The accuracy of the two methods was compared in terms of RMSE and estimation of posturographic parameters, using an OS as reference. The RMSE of the SDI was lower in tasks with little or no oscillations, while the BM outperformed in tasks with greater CoM displacement. Moreover, higher correlation coefficients were obtained between the posturographic parameters obtained with the BM and the OS. Our findings showed that the estimation of CoM displacement based on MIMU was reasonably accurate, and the use of the inertial sensors network methods should be preferred to estimate the kinematic parameters.


Sign in / Sign up

Export Citation Format

Share Document