scholarly journals Partial Measurement Invariance: Extending and Evaluating the Cluster Approach for Identifying Anchor Items

2021 ◽  
pp. 014662162110428
Author(s):  
Steffi Pohl ◽  
Daniel Schulze ◽  
Eric Stets

When measurement invariance does not hold, researchers aim for partial measurement invariance by identifying anchor items that are assumed to be measurement invariant. In this paper, we build on Bechger and Maris’s approach for identification of anchor items. Instead of identifying differential item functioning (DIF)-free items, they propose to identify different sets of items that are invariant in item parameters within the same item set. We extend their approach by an additional step in order to allow for identification of homogeneously functioning item sets. We evaluate the performance of the extended cluster approach under various conditions and compare its performance to that of previous approaches, that are the equal-mean difficulty (EMD) approach and the iterative forward approach. We show that the EMD and the iterative forward approaches perform well in conditions with balanced DIF or when DIF is small. In conditions with large and unbalanced DIF, they fail to recover the true group mean differences. With appropriate threshold settings, the cluster approach identified a cluster that resulted in unbiased mean difference estimates in all conditions. Compared to previous approaches, the cluster approach allows for a variety of different assumptions as well as for depicting the uncertainty in the results that stem from the choice of the assumption. Using a real data set, we illustrate how the assumptions of the previous approaches may be incorporated in the cluster approach and how the chosen assumption impacts the results.

2012 ◽  
Vol 28 (3) ◽  
pp. 201-207 ◽  
Author(s):  
Brian F. French ◽  
Brian Hand ◽  
William J. Therrien ◽  
Juan Antonio Valdivia Vazquez

Critical thinking (CT) can be described as the conscious process a person does to explore a situation or a problem from different perspectives. Accurate measurement of CT skills, especially across subgroups, depends in part on the measurement properties of an instrument being invariant or similar across those groups. The assessment of item-level invariance is a critical component of building a validity argument to ensure that scores on the Cornell Critical Thinking Test (CCTT) have similar meanings across groups. We used logistic regression to examine differential item functioning by sex in the CCTT-Form X. Results suggest that the items function similarly across boys and girls with only 5.6% (4) of items displaying DIF. This implies that any mean differences observed are not a function of a lack of measurement invariance and supports the validity of the inferences drawn when comparing boys and girls on scores on the CCTT.


2016 ◽  
Vol 78 (2) ◽  
pp. 343-352 ◽  
Author(s):  
Tenko Raykov ◽  
Dimiter M. Dimitrov ◽  
George A. Marcoulides ◽  
Tatyana Li ◽  
Natalja Menold

A latent variable modeling method for studying measurement invariance when evaluating latent constructs with multiple binary or binary scored items with no guessing is outlined. The approach extends the continuous indicator procedure described by Raykov and colleagues, utilizes similarly the false discovery rate approach to multiple testing, and permits one to locate violations of measurement invariance in loading or threshold parameters. The discussed method does not require selection of a reference observed variable and is directly applicable for studying differential item functioning with one- or two-parameter item response models. The extended procedure is illustrated on an empirical data set.


The purpose of this study was to examine the differences in sensitivity of three methods: IRT-Likelihood Ratio (IRT-LR), Mantel-Haenszel (MH) and Logistics Regression (LR), in detecting gender differential item functioning (DIF) on National Mathematics Examination (Ujian Nasional: UN) for 2014/2015 academic year in North Sumatera Province of Indonesia. DIF item shows the unfairness. It advantages the test takers of certain groups and disadvantages other group test takers, in the case they have the same ability. The presence of DIF was reviewed in grouping by gender: men as reference groups (R) and women as focus groups (F). This study used the experimental method, 3x1 design, with one factor (i.e. method) with three treatments, in the form of 3 different DIF detection methods. There are 5 types of UN Mathematics Year 2015 packages (codes: 1107, 2207, 3307, 4407 and 5507). The 2207 package code was taken as the sample data, consisting of 5000 participants (3067 women, 1933 men; for 40 UN items). Item selection was carried out based on the classical test theory (CTT) on 40 UN items, producing 32 items that fulfilled, and item response theory selection (IRT) produced 18 items that fulfilled. With program R 3.333 and IRTLRDIF 2.0, it was found 5 items were detected as DIF by the IRT-Likelihood Ratio-method (IRTLR), 4 items were detected as DIF by the Logistic Regression method (LR), and 3 items were detected as DIF by the MantelHaenszel method (MH). To test the sensitivity of the three methods, it is not enough with just one time DIF detection, but formed six groups of data analysis: (4400,40),(4400,32), (4400,18), (3000,40), (3000,32), (3000,18), and generate 40 random data sets (without repetitions) in each group, and conduct detecting DIF on the items in each data set. Although the data lacks model fit, the 3 parameter logistic model (3PL) is chosen as the most suitable model. With the Tukey's HSD post hoc test, the IRT-LR method is known to be more sensitive than the MH and LR methods in the group (4400,40) and (3000,40). The IRT-LR method is not longer more sensitive than LR in the group (4400,32) and (3000,32), but still more sensitive than MH. In the groups (4400,18) and (3000,18) the IRT-LR method is more sensitive than LR, but not significantly more sensitive than MH. The LR method is consistently tested to be more sensitive than the MH method in the entire analysis groups.


2011 ◽  
Vol 35 (8) ◽  
pp. 604-622 ◽  
Author(s):  
Hirotaka Fukuhara ◽  
Akihito Kamata

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into account, thus estimating DIF magnitude appropriately when a test is composed of testlets. A fully Bayesian estimation method was adopted for parameter estimation. The recovery of parameters was evaluated for the proposed DIF model. Simulation results revealed that the proposed bifactor MIRT DIF model produced better estimates of DIF magnitude and higher DIF detection rates than the traditional IRT DIF model for all simulation conditions. A real data analysis was also conducted by applying the proposed DIF model to a statewide reading assessment data set.


2020 ◽  
Vol 45 (1) ◽  
pp. 37-53
Author(s):  
Wenchao Ma ◽  
Ragip Terzi ◽  
Jimmy de la Torre

This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Zhongquan Li ◽  
Xia Zhao ◽  
Ang Sheng ◽  
Li Wang

Abstract Background Anxiety symptoms are pervasive among elderly populations around the world. The Geriatric Anxiety Inventory (the GAI) has been developed and widely used in screening those suffering from severe symptoms. Although debates about its dimensionality have been mostly resolved by Molde et al. (2019) with bifactor modeling, evidence regarding its measurement invariance across sex and somatic diseases is still missing. Methods This study attempted to provide complemental evidence to the dimensionality debates of the GAI with Mokken scale analysis and to examine its measurement invariance across sex and somatic diseases by conducting differential item functioning (DIF) analysis among a sample of older Chinese adults. The data was from responses of a large representative sample (N = 1314) in the Chinese National Survey Data Archive, focusing on the mental health of elderly adults. Results The results of Mokken scale analysis confirmed the unidimensionality of the GAI, and DIF analysis indicated measurement invariance of this inventory across individuals with different sex and somatic diseases, with just a few items exhibiting item bias but all of them negligible. Conclusions All these findings supported the use of this inventory among Chinese elders to screen anxiety symptoms and to make comparisons across sex and somatic diseases.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 782
Author(s):  
John J. O. Mogaka ◽  
Moses J. Chimbari

Background: Omics-based biomarkers (OBMs) inform precision medicine (PM). As omics-based technologies gradually move into clinical settings, however, a co-occurrence of biomedical research and clinical practice is likely an important variable in the implementation of PM. Currently, little is known about the implications of such research-practice co-occurrence. Methods: This study used data collected from a pilot study designed to inform a full-scale PM implementation study through the validation of the measurement tool. It applied item response theory (IRT) methods to assess the tool’s reliability and measurement invariance across two study subgroups associated with research and practice settings. Results: The study sample consisted of 31 participants. Measurement invariance assessment was through differential item functioning (DIF) analysis with bootstrapping through Monte Carlo simulation. Overall, 13 out of 22 items that formed the PMI scale had DIF at significance level α=0.25. Item response functions (IRFs) revealed how each subgroup members responded to scale items and their attitudes towards factors that influence PM implementation. Conclusions: Attitudinal similarities and differences towards factors influencing PM implementation amongst those in biomedical research as compared with those in practice were established. Results indicated PM implementation knowledge that is unique and common to both groups. The study established the validity and reliability of the new PM implementation measurement tool for the two subgroups.


2021 ◽  
Author(s):  
Mirka Henninger ◽  
Rudolf Debelak ◽  
Carolin Strobl

To detect differential item functioning (DIF), Rasch trees search for optimal splitpoints in covariates and identify subgroups of respondents in a data-driven way. To determine whether and in which covariate a split should be performed, Rasch trees use statistical significance tests. Consequently, Rasch trees are more likely to label small DIF effects as significant in larger samples. This leads to larger trees, which split the sample into more subgroups. What would be more desirable is an approach that is driven more by effect size rather than sample size. In order to achieve this, we suggest to implement an additional stopping criterion: the popular ETS classification scheme based on the Mantel-Haenszel odds ratio. This criterion helps us to evaluate whether a split in a Rasch tree is based on a substantial or an ignorable difference in item parameters, and it allows the Rasch tree to stop growing when DIF between the identified subgroups is small. Furthermore, it supports identifying DIF items and quantifying DIF effect sizes in each split. Based on simulation results, we conclude that the Mantel-Haenszel effect size further reduces unnecessary splits in Rasch trees under the null hypothesis, or when the sample size is large but DIF effects are negligible. To make the stopping criterion easy-to-use for applied researchers, we have implemented the procedure in the statistical software R. Finally, we discuss how DIF effects between different nodes in a Rasch tree can be interpreted and emphasize the importance of purification strategies for the Mantel-Haenszel procedure on tree stopping and DIF item classification.


2019 ◽  
Vol 80 (4) ◽  
pp. 638-664 ◽  
Author(s):  
Georgios D. Sideridis ◽  
Ioannis Tsaousis ◽  
Abeer A. Alamri

The main thesis of the present study is to use the Bayesian structural equation modeling (BSEM) methodology of establishing approximate measurement invariance (A-MI) using data from a national examination in Saudi Arabia as an alternative to not meeting strong invariance criteria. Instead, we illustrate how to account for the absence of measurement invariance using relative compared to exact criteria. A secondary goal was to compare latent means across groups using invariant parameters only and through utilizing exact and relative evaluative-MI protocol suggested equivalence of the thresholds using prior variances equal to 0.10. Subsequent differences between groups were evaluated using effect size criteria and the prior-posterior predictive p-value (PPPP), which proved to be invaluable in attesting for differences that are beyond zero, some meaningless nonzero estimate, and the three commonly used indices of effect sizes described by Cohen in 1988 (i.e., .20, .50, and .80). Results substantiated the use of the PPPP for evaluating mean differences across groups when utilizing nonexact evaluative criteria.


Sign in / Sign up

Export Citation Format

Share Document