An Exploratory Strategy to Identify and Define Sources of Differential Item Functioning

The sources of differential item functioning (DIF) items are usually identified through a qualitative content review by a panel of experts. However, the differential functioning for some DIF items might have been caused by reasons outside of the experts’ experiences, leading to the sources for these DIF items possibly being misidentified. Quantitative methods can help to provide useful information, such as the DIF status and the number of sources of the DIF, which in turn help the item review and revision process to be more efficient and precise. However, the current quantitative methods assume all possible sources should be known in advance and collected to accompany the item response data, which is not always the case in reality. To this end, an exploratory strategy, combined with the MIMIC (multiple-indicator multiple-cause) method, that can be used to identify and name new sources of DIF is proposed in this study. The performance of this strategy was investigated through simulation. The results showed that when a set of DIF-free items can be correctly identified to define the main dimension, the proposed exploratory MIMIC method can accurately recover a number of possible sources of DIF and the items that belong to each. A real data analysis was also implemented to demonstrate how this strategy can be used in reality. The results and findings of this study are further discussed.

Download Full-text

Differential Item Functioning in Brief Instruments of Disordered Eating

European Journal of Psychological Assessment ◽

10.1027/1015-5759/a000472 ◽

2019 ◽

Vol 35 (6) ◽

pp. 823-833 ◽

Cited By ~ 4

Author(s):

Desiree Thielemann ◽

Felicitas Richter ◽

Bernd Strauss ◽

Elmar Braehler ◽

Uwe Altmann ◽

...

Keyword(s):

Differential Item Functioning ◽

Disordered Eating ◽

Structural Equation ◽

Young Female ◽

Eating Attitudes ◽

Equation Model ◽

German Population ◽

Test Fairness ◽

Item Functioning ◽

Multiple Indicator

Abstract. Most instruments for the assessment of disordered eating were developed and validated in young female samples. However, they are often used in heterogeneous general population samples. Therefore, brief instruments of disordered eating should assess the severity of disordered eating equally well between individuals with different gender, age, body mass index (BMI), and socioeconomic status (SES). Differential item functioning (DIF) of two brief instruments of disordered eating (SCOFF, Eating Attitudes Test [EAT-8]) was modeled in a representative sample of the German population ( N = 2,527) using a multigroup item response theory (IRT) and a multiple-indicator multiple-cause (MIMIC) structural equation model (SEM) approach. No DIF by age was found in both questionnaires. Three items of the EAT-8 showed DIF across gender, indicating that females are more likely to agree than males, given the same severity of disordered eating. One item of the EAT-8 revealed slight DIF by BMI. DIF with respect to the SCOFF seemed to be negligible. Both questionnaires are equally fair across people with different age and SES. The DIF by gender that we found with respect to the EAT-8 as screening instrument may be also reflected in the use of different cutoff values for men and women. In general, both brief instruments assessing disordered eating revealed their strengths and limitations concerning test fairness for different groups.

Download Full-text

Detecting Differential Item Functioning Using Multiple-Group Cognitive Diagnosis Models

Applied Psychological Measurement ◽

10.1177/0146621620965745 ◽

2020 ◽

Vol 45 (1) ◽

pp. 37-53

Author(s):

Wenchao Ma ◽

Ragip Terzi ◽

Jimmy de la Torre

Keyword(s):

Differential Item Functioning ◽

Type I Error ◽

Search Algorithm ◽

Real Data ◽

Error Rates ◽

Cognitive Diagnosis ◽

Type I ◽

Wald Tests ◽

Multiple Group ◽

Item Functioning

This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.

Download Full-text

Testing for Nonuniform Differential Item Functioning With Multiple Indicator Multiple Cause Models

Applied Psychological Measurement ◽

10.1177/0146621611405984 ◽

2011 ◽

Vol 35 (5) ◽

pp. 339-361 ◽

Cited By ~ 56

Author(s):

Carol M. Woods ◽

Kevin J. Grimm

Keyword(s):

Differential Item Functioning ◽

Item Functioning ◽

Multiple Indicator

Download Full-text

Multiple-Group Noncompensatory Differential Item Functioning in Raju's Differential Functioning of Items and Tests

International Journal of Testing ◽

10.1080/15305058.2015.1009980 ◽

2015 ◽

Vol 15 (3) ◽

pp. 254-273

Author(s):

T. C. Oshima ◽

Keith Wright ◽

Nick White

Keyword(s):

Differential Item Functioning ◽

Differential Functioning ◽

Multiple Group ◽

Item Functioning

Download Full-text

Measurement Invariance and Differential Item Functioning in Latent Class Analysis With Stepwise Multiple Indicator Multiple Cause Modeling

Structural Equation Modeling A Multidisciplinary Journal ◽

10.1080/10705511.2016.1254049 ◽

2017 ◽

Vol 24 (2) ◽

pp. 180-197 ◽

Cited By ~ 21

Author(s):

Katherine E. Masyn

Keyword(s):

Latent Class Analysis ◽

Differential Item Functioning ◽

Measurement Invariance ◽

Latent Class ◽

Class Analysis ◽

Item Functioning ◽

Multiple Indicator

Download Full-text

Differential Item Functioning of the Full and Brief Wisconsin Schizotypy Scales in Asian, White, Hispanic, and Multiethnic Samples and Between Sexes

Assessment ◽

10.1177/1073191117719509 ◽

2017 ◽

Vol 26 (6) ◽

pp. 1001-1013 ◽

Cited By ~ 5

Author(s):

David C. Cicero ◽

Elizabeth A. Martin ◽

Alexander Krieg

Keyword(s):

Differential Item Functioning ◽

Structural Equation ◽

Self Report ◽

Equation Modeling ◽

Modeling Framework ◽

Magical Ideation ◽

Item Functioning ◽

Multiple Indicator ◽

Wisconsin Schizotypy Scales ◽

Multiple Causes

The Wisconsin Schizotypy Scales, including their brief versions, are among the most commonly used self-report measures of schizotypy. Although they have been used extensively in many ethnic groups, few studies have examined their differential item functioning (DIF) across groups. The current study included 1,056 Asian, 408 White, 476 Multiethnic, and 372 Hispanic undergraduates. Unidimensional models of the brief Magical Ideation Scale and Perceptual Aberration Scales fit the data well. For both scales, global tests of measurement invariance provided mixed evidence, but few of the items displayed DIF across ethnicities or between sexes within a multiple indicator multiple causes model. For the full versions of the scales and the brief Revised Social Anhedonia Scale, multiple indicator multiple causes models within an exploratory structural equation modeling framework found that few of the items had DIF. These findings suggest that some of the items may have different psychometric properties across groups, but most items do not.

Download Full-text

МЕТОДИ ТА ПРОГРАМНІ ЗАСОБИ ДОСЛІДЖЕННЯ ВАЛІДНОСТІ ТЕСТОВИХ РЕЗУЛЬТАТІВ ДЛЯ ГРУП ТЕСТОВАНИХ З ПЕВНИМИ ІНДИВІДУАЛЬНИМИ ОСОБЛИВОСТЯМИ

Information Technologies and Learning Tools ◽

10.33407/itlt.v50i6.1283 ◽

2016 ◽

Vol 50 (6) ◽

pp. 165

Author(s):

Tetiana V. Lisova

Keyword(s):

Item Response Theory ◽

Differential Item Functioning ◽

Item Response ◽

Real Data ◽

Necessary Condition ◽

Ratio Test ◽

Response Theory ◽

Item Functioning ◽

Logistic Regression Method ◽

The Given

The necessary condition for the presence of biased assessment by some test is differential item functioning in different groups of test takers. The ideas of some statistical methods for detecting Differential Item Functioning are described in the given article. They were developed in the framework of the main approaches to modeling test results: using contingency tables, regression models, multidimensional models and models of Item Response Theory. The Mantel-Haenszel procedure, logistic regression method, SIBTEST and Item Response Theory Likelihood Ratio Test are considered. The characteristics of each method and conditions of their application are specified. Overview of existing free software tools implementing these methods is carried out. Comparisons of these methods are conducted on the example of real data. Also notes that it is appropriate to use several methods simultaneously to reduce the risk of false conclusions.

Download Full-text

Treading Carefully: Agnostic Identiﬁcation as the First Step of Detecting Differential Item Functioning

10.31234/osf.io/974vw ◽

2021 ◽

Author(s):

Ben Stenhaug ◽

Michael C. Frank ◽

Benjamin Domingue

Keyword(s):

Best Practices ◽

Differential Item Functioning ◽

Item Response ◽

Identification Problem ◽

Demographic Groups ◽

Test Items ◽

Response Data ◽

Item Functioning ◽

Sum Scores ◽

Theory Framework

Differential item functioning (DIF) is a popular technique within the item-response theory framework for detecting test items that are biased against particular demographic groups. The last thirty years have brought signiﬁcant methodological advances in detecting DIF. Still, typical methods—such as matching on sum scores or identifying anchor items—are based exclusively on internal criteria and therefore rely on a crucial piece of circular logic: items with DIF are identiﬁed via an assumption that other items do not have DIF. This logic is an attempt to solve an easy-to-overlook identiﬁcation problem at the beginning of most DIF detection. We explore this problem, which we describe as the Fundamental DIF Identiﬁcation Problem, in depth here. We suggest three steps for determining whether it is surmountable and DIF detection results can be trusted. (1) Examine raw item response data for potential DIF. To this end, we introduce a new graphical method for visualizing potential DIF in raw item response data. (2) Compare the results of a variety of methods. These methods, which we describe in detail, include commonly-used anchor item methods, recently-proposed anchor point methods, and our suggested adaptations. (3) Interpret results in light of the possibility of DIF methods failing. We illustrate the basic challenge and the methodological options using the classic verbal aggression data and a simulation study. We recommend best practices for cautious DIF detection.

Download Full-text

Partial Measurement Invariance: Extending and Evaluating the Cluster Approach for Identifying Anchor Items

Applied Psychological Measurement ◽

10.1177/01466216211042809 ◽

2021 ◽

pp. 014662162110428

Author(s):

Steffi Pohl ◽

Daniel Schulze ◽

Eric Stets

Keyword(s):

Differential Item Functioning ◽

Measurement Invariance ◽

Real Data ◽

Cluster Approach ◽

Data Set ◽

Item Functioning ◽

Item Parameters ◽

Mean Differences ◽

True Group ◽

Partial Measurement Invariance

When measurement invariance does not hold, researchers aim for partial measurement invariance by identifying anchor items that are assumed to be measurement invariant. In this paper, we build on Bechger and Maris’s approach for identification of anchor items. Instead of identifying differential item functioning (DIF)-free items, they propose to identify different sets of items that are invariant in item parameters within the same item set. We extend their approach by an additional step in order to allow for identification of homogeneously functioning item sets. We evaluate the performance of the extended cluster approach under various conditions and compare its performance to that of previous approaches, that are the equal-mean difficulty (EMD) approach and the iterative forward approach. We show that the EMD and the iterative forward approaches perform well in conditions with balanced DIF or when DIF is small. In conditions with large and unbalanced DIF, they fail to recover the true group mean differences. With appropriate threshold settings, the cluster approach identified a cluster that resulted in unbiased mean difference estimates in all conditions. Compared to previous approaches, the cluster approach allows for a variety of different assumptions as well as for depicting the uncertainty in the results that stem from the choice of the assumption. Using a real data set, we illustrate how the assumptions of the previous approaches may be incorporated in the cluster approach and how the chosen assumption impacts the results.

Download Full-text

Rasch Analysis of the Power as Knowing Participation in Change Tool - the Brazilian version

Revista Latino-Americana de Enfermagem ◽

10.1590/s0104-11692013000700021 ◽

2013 ◽

Vol 21 (spe) ◽

pp. 163-171

Author(s):

Erika de Souza Guedes ◽

Luiz Carlos Orozco-Vargas ◽

Ruth Natália Teresa Turrini ◽

Regina Márcia Cardoso de Sousa ◽

Mariana Alvina dos Santos ◽

...

Keyword(s):

Psychometric Properties ◽

Differential Item Functioning ◽

Rasch Analysis ◽

Nursing Assistants ◽

Differential Functioning ◽

Item Functioning ◽

Brazilian Version ◽

Baccalaureate Nurses

OBJECTIVES: the objective of this study was to evaluate the items contained in the Brazilian version of the Power as Knowing Participation in Change Tool (PKPCT). METHOD: investigation of the psychometric properties of the mentioned questionnaire through Rasch analysis. RESULTS: the data from 952 nursing assistants and 627 baccalaureate nurses were analyzed (average age 44.1 (SD=9.5); 13.0% men). The subscales Choices, Awareness, Freedom and Involvement were tested separately and presented unidimensionality; the categories of the responses given to the items were compiled from 7 to 3 levels and the items fit the model well, except for the following/leading item, in which the infit and outfit values were above 1.4; this item has also presented Differential Item Functioning (DIF) according to the participant's role. The reliability of the items was of 0.99 and the reliability of the participants ranged from 0.80 to 0.84 in the subscales. Items with extremely high levels of difficulty were not identified. CONCLUSIONS: the PKPCT should not be viewed as unidimensional, items with extremely high levels of difficulty in the scale need to be created and the differential functioning of some items has to be further investigated.

Download Full-text