Examining Measurement Invariance and Differential Item Functioning With Discrete Latent Construct Indicators: A Note on a Multiple Testing Procedure

A latent variable modeling method for studying measurement invariance when evaluating latent constructs with multiple binary or binary scored items with no guessing is outlined. The approach extends the continuous indicator procedure described by Raykov and colleagues, utilizes similarly the false discovery rate approach to multiple testing, and permits one to locate violations of measurement invariance in loading or threshold parameters. The discussed method does not require selection of a reference observed variable and is directly applicable for studying differential item functioning with one- or two-parameter item response models. The extended procedure is illustrated on an empirical data set.

Download Full-text

Modeling and Testing Differential Item Functioning in Unidimensional Binary Item Response Models with a Single Continuous Covariate: A Functional Data Analysis Approach

Psychometrika ◽

10.1007/s11336-015-9473-x ◽

2015 ◽

Vol 81 (2) ◽

pp. 371-398 ◽

Cited By ~ 2

Author(s):

Yang Liu ◽

Brooke E. Magnus ◽

David Thissen

Keyword(s):

Data Analysis ◽

Differential Item Functioning ◽

Item Response ◽

Functional Data Analysis ◽

Functional Data ◽

Response Models ◽

Item Response Models ◽

Item Functioning ◽

Continuous Covariate ◽

Binary Item

Download Full-text

Partial Measurement Invariance: Extending and Evaluating the Cluster Approach for Identifying Anchor Items

Applied Psychological Measurement ◽

10.1177/01466216211042809 ◽

2021 ◽

pp. 014662162110428

Author(s):

Steffi Pohl ◽

Daniel Schulze ◽

Eric Stets

Keyword(s):

Differential Item Functioning ◽

Measurement Invariance ◽

Real Data ◽

Cluster Approach ◽

Data Set ◽

Item Functioning ◽

Item Parameters ◽

Mean Differences ◽

True Group ◽

Partial Measurement Invariance

When measurement invariance does not hold, researchers aim for partial measurement invariance by identifying anchor items that are assumed to be measurement invariant. In this paper, we build on Bechger and Maris’s approach for identification of anchor items. Instead of identifying differential item functioning (DIF)-free items, they propose to identify different sets of items that are invariant in item parameters within the same item set. We extend their approach by an additional step in order to allow for identification of homogeneously functioning item sets. We evaluate the performance of the extended cluster approach under various conditions and compare its performance to that of previous approaches, that are the equal-mean difficulty (EMD) approach and the iterative forward approach. We show that the EMD and the iterative forward approaches perform well in conditions with balanced DIF or when DIF is small. In conditions with large and unbalanced DIF, they fail to recover the true group mean differences. With appropriate threshold settings, the cluster approach identified a cluster that resulted in unbiased mean difference estimates in all conditions. Compared to previous approaches, the cluster approach allows for a variety of different assumptions as well as for depicting the uncertainty in the results that stem from the choice of the assumption. Using a real data set, we illustrate how the assumptions of the previous approaches may be incorporated in the cluster approach and how the chosen assumption impacts the results.

Download Full-text

Item-Focused Trees for the Detection of Differential Item Functioning in Partial Credit Models

Educational and Psychological Measurement ◽

10.1177/0013164417722179 ◽

2017 ◽

Vol 78 (5) ◽

pp. 781-804 ◽

Cited By ~ 2

Author(s):

Stella Bollmann ◽

Moritz Berger ◽

Gerhard Tutz

Keyword(s):

Differential Item Functioning ◽

Present Article ◽

Item Response ◽

Partial Credit Model ◽

Partial Credit ◽

Response Models ◽

Item Response Models ◽

Item Functioning ◽

Alternative Approaches ◽

Ordered Response

Various methods to detect differential item functioning (DIF) in item response models are available. However, most of these methods assume that the responses are binary, and so for ordered response categories available methods are scarce. In the present article, DIF in the widely used partial credit model is investigated. An item-focused tree is proposed that allows the detection of DIF items, which might affect the performance of the partial credit model. The method uses tree methodology, yielding a tree for each item that is detected as DIF item. The visualization as trees makes the results easily accessible, as the obtained trees show which variables induce DIF and in which way. In the present paper, the new method is compared with alternative approaches and simulations demonstrate the performance of the method.

Download Full-text

Stochastic Approximation Methods for Latent Regression Item Response Models

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998609346970 ◽

2010 ◽

Vol 35 (2) ◽

pp. 174-193 ◽

Cited By ~ 23

Author(s):

Matthias von Davier ◽

Sandip Sinharay

Keyword(s):

Item Response ◽

Stochastic Approximation ◽

Latent Variable ◽

Reading Assessment ◽

Mathematics Assessment ◽

National Assessment ◽

Variable Model ◽

Response Models ◽

Item Response Models ◽

Latent Regression

This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates serving as predictors of the conditional distribution of ability. Applications to estimating latent regression models for National Assessment of Educational Progress (NAEP) data from the 2000 Grade 4 mathematics assessment and the Grade 8 reading assessment from 2002 are presented and results of the proposed method are compared to results obtained using current operational procedures.

Download Full-text

A region-based multiple testing method for hypotheses ordered in space or time

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2013-0075 ◽

2015 ◽

Vol 14 (1) ◽

pp. 1-19 ◽

Cited By ~ 4

Author(s):

Rosa J. Meijer ◽

Thijmen J.P. Krebs ◽

Jelle J. Goeman

Keyword(s):

Error Rate ◽

Null Hypothesis ◽

Multiple Testing ◽

Testing Procedure ◽

Familywise Error Rate ◽

Expected Number ◽

Multiple Testing Procedure ◽

Exact Location ◽

Testing Method

AbstractWe present a multiple testing method for hypotheses that are ordered in space or time. Given such hypotheses, the elementary hypotheses as well as regions of consecutive hypotheses are of interest. These region hypotheses not only have intrinsic meaning but testing them also has the advantage that (potentially small) signals across a region are combined in one test. Because the expected number and length of potentially interesting regions are usually not available beforehand, we propose a method that tests all possible region hypotheses as well as all individual hypotheses in a single multiple testing procedure that controls the familywise error rate. We start at testing the global null-hypothesis and when this hypothesis can be rejected we continue with further specifying the exact location/locations of the effect present. The method is implemented in the

Download Full-text

Sensitivity Of Differential Item Functioning Detection Methods On National Mathematics Examination In North Sumatera Province, Indonesia

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e1226.0585c19 ◽

2019 ◽

Vol 8 (5C) ◽

pp. 1538-1549

Keyword(s):

Differential Item Functioning ◽

Likelihood Ratio ◽

Classical Test Theory ◽

Test Theory ◽

Detection Methods ◽

Ratio Method ◽

Suitable Model ◽

Data Set ◽

Item Functioning ◽

Mathematics Examination

The purpose of this study was to examine the differences in sensitivity of three methods: IRT-Likelihood Ratio (IRT-LR), Mantel-Haenszel (MH) and Logistics Regression (LR), in detecting gender differential item functioning (DIF) on National Mathematics Examination (Ujian Nasional: UN) for 2014/2015 academic year in North Sumatera Province of Indonesia. DIF item shows the unfairness. It advantages the test takers of certain groups and disadvantages other group test takers, in the case they have the same ability. The presence of DIF was reviewed in grouping by gender: men as reference groups (R) and women as focus groups (F). This study used the experimental method, 3x1 design, with one factor (i.e. method) with three treatments, in the form of 3 different DIF detection methods. There are 5 types of UN Mathematics Year 2015 packages (codes: 1107, 2207, 3307, 4407 and 5507). The 2207 package code was taken as the sample data, consisting of 5000 participants (3067 women, 1933 men; for 40 UN items). Item selection was carried out based on the classical test theory (CTT) on 40 UN items, producing 32 items that fulfilled, and item response theory selection (IRT) produced 18 items that fulfilled. With program R 3.333 and IRTLRDIF 2.0, it was found 5 items were detected as DIF by the IRT-Likelihood Ratio-method (IRTLR), 4 items were detected as DIF by the Logistic Regression method (LR), and 3 items were detected as DIF by the MantelHaenszel method (MH). To test the sensitivity of the three methods, it is not enough with just one time DIF detection, but formed six groups of data analysis: (4400,40),(4400,32), (4400,18), (3000,40), (3000,32), (3000,18), and generate 40 random data sets (without repetitions) in each group, and conduct detecting DIF on the items in each data set. Although the data lacks model fit, the 3 parameter logistic model (3PL) is chosen as the most suitable model. With the Tukey's HSD post hoc test, the IRT-LR method is known to be more sensitive than the MH and LR methods in the group (4400,40) and (3000,40). The IRT-LR method is not longer more sensitive than LR in the group (4400,32) and (3000,32), but still more sensitive than MH. In the groups (4400,18) and (3000,18) the IRT-LR method is more sensitive than LR, but not significantly more sensitive than MH. The LR method is consistently tested to be more sensitive than the MH method in the entire analysis groups.

Download Full-text

A Second Look at Subjective Wellbeing Using Differential Item Functioning

10.47260/jsem/1022 ◽

2021 ◽

pp. 23-42

Author(s):

Karen H. Larwin ◽

Milton Harvey

Keyword(s):

Differential Item Functioning ◽

Latent Variable ◽

Subjective Wellbeing ◽

Satisfaction With Life ◽

Weighted Least Squares ◽

Well Being ◽

Satisfaction With Life Scale ◽

Subjective Well Being ◽

Current Investigation ◽

Item Functioning

The current investigation uses latent variable modeling to investigate Subjective Well-Being (SWB). As a follow-up to Larwin, Harvey, and Constantinou (2020), subjective wellbeing is presented through third-order factor model, which explains two-second order factors, SWB and Interpersonal Experiences (IES) while incorporating measures of relationship and resiliency self-evaluations. Additionally, the current investigation considers differential item functioning not considered in the existing SWB literature. JEL classification numbers: C1,C3,C4,C9. Keywords: Subjective Well-Being, Satisfaction with Life Scale, Subjective Happiness Scale, Brief Resiliency Scale, Relationship Assessment Scale, Multiple Indicators Multiple Causes (MIMIC), Weighted least squares mean variance adjusted estimator (WLSMV).

Download Full-text

A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items

Applied Psychological Measurement ◽

10.1177/0146621611428447 ◽

2011 ◽

Vol 35 (8) ◽

pp. 604-622 ◽

Cited By ~ 16

Author(s):

Hirotaka Fukuhara ◽

Akihito Kamata

Keyword(s):

Item Response Theory ◽

Differential Item Functioning ◽

Item Response ◽

Estimation Method ◽

Multidimensional Item Response Theory ◽

Multidimensional Item Response ◽

Response Theory ◽

Data Set ◽

Detection Rates ◽

Item Functioning

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into account, thus estimating DIF magnitude appropriately when a test is composed of testlets. A fully Bayesian estimation method was adopted for parameter estimation. The recovery of parameters was evaluated for the proposed DIF model. Simulation results revealed that the proposed bifactor MIRT DIF model produced better estimates of DIF magnitude and higher DIF detection rates than the traditional IRT DIF model for all simulation conditions. A real data analysis was also conducted by applying the proposed DIF model to a statewide reading assessment data set.

Download Full-text

An Alpha-Exhaustive Multiple Testing Procedure

Current Research in Biostatistics ◽

10.3844/amjbsp.2016.30.41 ◽

2016 ◽

Vol 6 (2) ◽

pp. 30-41

Author(s):

Mark Chang ◽

Xuan Deng ◽

John Balser

Keyword(s):

Multiple Testing ◽

Testing Procedure ◽

Multiple Testing Procedure

Download Full-text

Item response analysis of the Geriatric Anxiety Inventory among the elderly in China: dimensionality and differential item functioning test

BMC Geriatrics ◽

10.1186/s12877-019-1346-1 ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 1

Author(s):

Zhongquan Li ◽

Xia Zhao ◽

Ang Sheng ◽

Li Wang

Keyword(s):

Differential Item Functioning ◽

Measurement Invariance ◽

The Elderly ◽

Anxiety Symptoms ◽

Response Analysis ◽

Scale Analysis ◽

Mokken Scale Analysis ◽

Item Functioning ◽

Bifactor Modeling ◽

Somatic Diseases

Abstract Background Anxiety symptoms are pervasive among elderly populations around the world. The Geriatric Anxiety Inventory (the GAI) has been developed and widely used in screening those suffering from severe symptoms. Although debates about its dimensionality have been mostly resolved by Molde et al. (2019) with bifactor modeling, evidence regarding its measurement invariance across sex and somatic diseases is still missing. Methods This study attempted to provide complemental evidence to the dimensionality debates of the GAI with Mokken scale analysis and to examine its measurement invariance across sex and somatic diseases by conducting differential item functioning (DIF) analysis among a sample of older Chinese adults. The data was from responses of a large representative sample (N = 1314) in the Chinese National Survey Data Archive, focusing on the mental health of elderly adults. Results The results of Mokken scale analysis confirmed the unidimensionality of the GAI, and DIF analysis indicated measurement invariance of this inventory across individuals with different sex and somatic diseases, with just a few items exhibiting item bias but all of them negligible. Conclusions All these findings supported the use of this inventory among Chinese elders to screen anxiety symptoms and to make comparisons across sex and somatic diseases.

Download Full-text