scholarly journals An Exploratory Strategy to Identify and Define Sources of Differential Item Functioning

2020 ◽  
Vol 44 (7-8) ◽  
pp. 548-560
Author(s):  
Chung-Ping Cheng ◽  
Chi-Chen Chen ◽  
Ching-Lin Shih

The sources of differential item functioning (DIF) items are usually identified through a qualitative content review by a panel of experts. However, the differential functioning for some DIF items might have been caused by reasons outside of the experts’ experiences, leading to the sources for these DIF items possibly being misidentified. Quantitative methods can help to provide useful information, such as the DIF status and the number of sources of the DIF, which in turn help the item review and revision process to be more efficient and precise. However, the current quantitative methods assume all possible sources should be known in advance and collected to accompany the item response data, which is not always the case in reality. To this end, an exploratory strategy, combined with the MIMIC (multiple-indicator multiple-cause) method, that can be used to identify and name new sources of DIF is proposed in this study. The performance of this strategy was investigated through simulation. The results showed that when a set of DIF-free items can be correctly identified to define the main dimension, the proposed exploratory MIMIC method can accurately recover a number of possible sources of DIF and the items that belong to each. A real data analysis was also implemented to demonstrate how this strategy can be used in reality. The results and findings of this study are further discussed.

2019 ◽  
Vol 35 (6) ◽  
pp. 823-833 ◽  
Author(s):  
Desiree Thielemann ◽  
Felicitas Richter ◽  
Bernd Strauss ◽  
Elmar Braehler ◽  
Uwe Altmann ◽  
...  

Abstract. Most instruments for the assessment of disordered eating were developed and validated in young female samples. However, they are often used in heterogeneous general population samples. Therefore, brief instruments of disordered eating should assess the severity of disordered eating equally well between individuals with different gender, age, body mass index (BMI), and socioeconomic status (SES). Differential item functioning (DIF) of two brief instruments of disordered eating (SCOFF, Eating Attitudes Test [EAT-8]) was modeled in a representative sample of the German population ( N = 2,527) using a multigroup item response theory (IRT) and a multiple-indicator multiple-cause (MIMIC) structural equation model (SEM) approach. No DIF by age was found in both questionnaires. Three items of the EAT-8 showed DIF across gender, indicating that females are more likely to agree than males, given the same severity of disordered eating. One item of the EAT-8 revealed slight DIF by BMI. DIF with respect to the SCOFF seemed to be negligible. Both questionnaires are equally fair across people with different age and SES. The DIF by gender that we found with respect to the EAT-8 as screening instrument may be also reflected in the use of different cutoff values for men and women. In general, both brief instruments assessing disordered eating revealed their strengths and limitations concerning test fairness for different groups.


2020 ◽  
Vol 45 (1) ◽  
pp. 37-53
Author(s):  
Wenchao Ma ◽  
Ragip Terzi ◽  
Jimmy de la Torre

This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.


Assessment ◽  
2017 ◽  
Vol 26 (6) ◽  
pp. 1001-1013 ◽  
Author(s):  
David C. Cicero ◽  
Elizabeth A. Martin ◽  
Alexander Krieg

The Wisconsin Schizotypy Scales, including their brief versions, are among the most commonly used self-report measures of schizotypy. Although they have been used extensively in many ethnic groups, few studies have examined their differential item functioning (DIF) across groups. The current study included 1,056 Asian, 408 White, 476 Multiethnic, and 372 Hispanic undergraduates. Unidimensional models of the brief Magical Ideation Scale and Perceptual Aberration Scales fit the data well. For both scales, global tests of measurement invariance provided mixed evidence, but few of the items displayed DIF across ethnicities or between sexes within a multiple indicator multiple causes model. For the full versions of the scales and the brief Revised Social Anhedonia Scale, multiple indicator multiple causes models within an exploratory structural equation modeling framework found that few of the items had DIF. These findings suggest that some of the items may have different psychometric properties across groups, but most items do not.


2016 ◽  
Vol 50 (6) ◽  
pp. 165
Author(s):  
Tetiana V. Lisova

The necessary condition for the presence of biased assessment by some test is differential item functioning in different groups of test takers. The ideas of some statistical methods for detecting Differential Item Functioning are described in the given article. They were developed in the framework of the main approaches to modeling test results: using contingency tables, regression models, multidimensional models and models of Item Response Theory. The Mantel-Haenszel procedure, logistic regression method, SIBTEST and Item Response Theory Likelihood Ratio Test are considered. The characteristics of each method and conditions of their application are specified. Overview of existing free software tools implementing these methods is carried out. Comparisons of these methods are conducted on the example of real data. Also notes that it is appropriate to use several methods simultaneously to reduce the risk of false conclusions.


2021 ◽  
Author(s):  
Ben Stenhaug ◽  
Michael C. Frank ◽  
Benjamin Domingue

Differential item functioning (DIF) is a popular technique within the item-response theory framework for detecting test items that are biased against particular demographic groups. The last thirty years have brought significant methodological advances in detecting DIF. Still, typical methods—such as matching on sum scores or identifying anchor items—are based exclusively on internal criteria and therefore rely on a crucial piece of circular logic: items with DIF are identified via an assumption that other items do not have DIF. This logic is an attempt to solve an easy-to-overlook identification problem at the beginning of most DIF detection. We explore this problem, which we describe as the Fundamental DIF Identification Problem, in depth here. We suggest three steps for determining whether it is surmountable and DIF detection results can be trusted. (1) Examine raw item response data for potential DIF. To this end, we introduce a new graphical method for visualizing potential DIF in raw item response data. (2) Compare the results of a variety of methods. These methods, which we describe in detail, include commonly-used anchor item methods, recently-proposed anchor point methods, and our suggested adaptations. (3) Interpret results in light of the possibility of DIF methods failing. We illustrate the basic challenge and the methodological options using the classic verbal aggression data and a simulation study. We recommend best practices for cautious DIF detection.


2021 ◽  
pp. 014662162110428
Author(s):  
Steffi Pohl ◽  
Daniel Schulze ◽  
Eric Stets

When measurement invariance does not hold, researchers aim for partial measurement invariance by identifying anchor items that are assumed to be measurement invariant. In this paper, we build on Bechger and Maris’s approach for identification of anchor items. Instead of identifying differential item functioning (DIF)-free items, they propose to identify different sets of items that are invariant in item parameters within the same item set. We extend their approach by an additional step in order to allow for identification of homogeneously functioning item sets. We evaluate the performance of the extended cluster approach under various conditions and compare its performance to that of previous approaches, that are the equal-mean difficulty (EMD) approach and the iterative forward approach. We show that the EMD and the iterative forward approaches perform well in conditions with balanced DIF or when DIF is small. In conditions with large and unbalanced DIF, they fail to recover the true group mean differences. With appropriate threshold settings, the cluster approach identified a cluster that resulted in unbiased mean difference estimates in all conditions. Compared to previous approaches, the cluster approach allows for a variety of different assumptions as well as for depicting the uncertainty in the results that stem from the choice of the assumption. Using a real data set, we illustrate how the assumptions of the previous approaches may be incorporated in the cluster approach and how the chosen assumption impacts the results.


2013 ◽  
Vol 21 (spe) ◽  
pp. 163-171
Author(s):  
Erika de Souza Guedes ◽  
Luiz Carlos Orozco-Vargas ◽  
Ruth Natália Teresa Turrini ◽  
Regina Márcia Cardoso de Sousa ◽  
Mariana Alvina dos Santos ◽  
...  

OBJECTIVES: the objective of this study was to evaluate the items contained in the Brazilian version of the Power as Knowing Participation in Change Tool (PKPCT). METHOD: investigation of the psychometric properties of the mentioned questionnaire through Rasch analysis. RESULTS: the data from 952 nursing assistants and 627 baccalaureate nurses were analyzed (average age 44.1 (SD=9.5); 13.0% men). The subscales Choices, Awareness, Freedom and Involvement were tested separately and presented unidimensionality; the categories of the responses given to the items were compiled from 7 to 3 levels and the items fit the model well, except for the following/leading item, in which the infit and outfit values were above 1.4; this item has also presented Differential Item Functioning (DIF) according to the participant's role. The reliability of the items was of 0.99 and the reliability of the participants ranged from 0.80 to 0.84 in the subscales. Items with extremely high levels of difficulty were not identified. CONCLUSIONS: the PKPCT should not be viewed as unidimensional, items with extremely high levels of difficulty in the scale need to be created and the differential functioning of some items has to be further investigated.


Sign in / Sign up

Export Citation Format

Share Document