Analysis of Differential Item Functioning (DIF) Using Hierarchical Logistic Regression Models

2002 ◽  
Vol 27 (1) ◽  
pp. 53-75 ◽  
Author(s):  
David B. Swanson ◽  
Brian E. Clauser ◽  
Susan M. Case ◽  
Ronald J. Nungester ◽  
Carol Featherman

Over the past 25 years a range of parametric and nonparametric methods have been developed for analyzing Differential Item Functioning (DIF). These procedures are typically performed for each item individually or for small numbers of related items. Because the analytic procedures focus on individual items, it has been difficult to pool information across items to identify potential sources of DIF analytically. In this article, we outline an approach to DIF analysis using hierarchical logistic regression that makes it possible to combine results of logistic regression analyses across items to identify consistent sources of DIF, to quantify the proportion of explained variation in DIF coefficients, and to compare the predictive accuracy of alternate explanations for DIF. The approach can also be used to improve the accuracy of DIF estimates for individual items by applying empirical Bayes techniques, with DIF-related item characteristics serving as collateral information. To illustrate the hierarchical logistic regression procedure, we use a large data set derived from recent computer-based administrations of Step 2, the clinical science component of the United States Medical Licensing Examination (USMLE®). Results of a small Monte Carlo study of the accuracy of the DIF estimates are also reported.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Richard Johnston ◽  
Xiaohan Yan ◽  
Tatiana M. Anderson ◽  
Edwin A. Mitchell

AbstractThe effect of altitude on the risk of sudden infant death syndrome (SIDS) has been reported previously, but with conflicting findings. We aimed to examine whether the risk of sudden unexpected infant death (SUID) varies with altitude in the United States. Data from the Centers for Disease Control and Prevention (CDC)’s Cohort Linked Birth/Infant Death Data Set for births between 2005 and 2010 were examined. County of birth was used to estimate altitude. Logistic regression and Generalized Additive Model (GAM) were used, adjusting for year, mother’s race, Hispanic origin, marital status, age, education and smoking, father’s age and race, number of prenatal visits, plurality, live birth order, and infant’s sex, birthweight and gestation. There were 25,305,778 live births over the 6-year study period. The total number of deaths from SUID in this period were 23,673 (rate = 0.94/1000 live births). In the logistic regression model there was a small, but statistically significant, increased risk of SUID associated with birth at > 8000 feet compared with < 6000 feet (aOR = 1.93; 95% CI 1.00–3.71). The GAM showed a similar increased risk over 8000 feet, but this was not statistically significant. Only 9245 (0.037%) of mothers gave birth at > 8000 feet during the study period and 10 deaths (0.042%) were attributed to SUID. The number of SUID deaths at this altitude in the United States is very small (10 deaths in 6 years).


2021 ◽  
Author(s):  
Zhenling Jiang

This paper studies price bargaining when both parties have left-digit bias when processing numbers. The empirical analysis focuses on the auto finance market in the United States, using a large data set of 35 million auto loans. Incorporating left-digit bias in bargaining is motivated by several intriguing observations. The scheduled monthly payments of auto loans bunch at both $9- and $0-ending digits, especially over $100 marks. In addition, $9-ending loans carry a higher interest rate, and $0-ending loans have a lower interest rate. We develop a Nash bargaining model that allows for left-digit bias from both consumers and finance managers of auto dealers. Results suggest that both parties are subject to this basic human bias: the perceived difference between $9- and the next $0-ending payments is larger than $1, especially between $99- and $00-ending payments. The proposed model can explain the phenomena of payments bunching and differential interest rates for loans with different ending digits. We use counterfactuals to show a nuanced impact of left-digit bias, which can both increase and decrease the payments. Overall, bias from both sides leads to a $33 increase in average payment per loan compared with a benchmark case with no bias. This paper was accepted by Matthew Shum, marketing.


The purpose of this study was to examine the differences in sensitivity of three methods: IRT-Likelihood Ratio (IRT-LR), Mantel-Haenszel (MH) and Logistics Regression (LR), in detecting gender differential item functioning (DIF) on National Mathematics Examination (Ujian Nasional: UN) for 2014/2015 academic year in North Sumatera Province of Indonesia. DIF item shows the unfairness. It advantages the test takers of certain groups and disadvantages other group test takers, in the case they have the same ability. The presence of DIF was reviewed in grouping by gender: men as reference groups (R) and women as focus groups (F). This study used the experimental method, 3x1 design, with one factor (i.e. method) with three treatments, in the form of 3 different DIF detection methods. There are 5 types of UN Mathematics Year 2015 packages (codes: 1107, 2207, 3307, 4407 and 5507). The 2207 package code was taken as the sample data, consisting of 5000 participants (3067 women, 1933 men; for 40 UN items). Item selection was carried out based on the classical test theory (CTT) on 40 UN items, producing 32 items that fulfilled, and item response theory selection (IRT) produced 18 items that fulfilled. With program R 3.333 and IRTLRDIF 2.0, it was found 5 items were detected as DIF by the IRT-Likelihood Ratio-method (IRTLR), 4 items were detected as DIF by the Logistic Regression method (LR), and 3 items were detected as DIF by the MantelHaenszel method (MH). To test the sensitivity of the three methods, it is not enough with just one time DIF detection, but formed six groups of data analysis: (4400,40),(4400,32), (4400,18), (3000,40), (3000,32), (3000,18), and generate 40 random data sets (without repetitions) in each group, and conduct detecting DIF on the items in each data set. Although the data lacks model fit, the 3 parameter logistic model (3PL) is chosen as the most suitable model. With the Tukey's HSD post hoc test, the IRT-LR method is known to be more sensitive than the MH and LR methods in the group (4400,40) and (3000,40). The IRT-LR method is not longer more sensitive than LR in the group (4400,32) and (3000,32), but still more sensitive than MH. In the groups (4400,18) and (3000,18) the IRT-LR method is more sensitive than LR, but not significantly more sensitive than MH. The LR method is consistently tested to be more sensitive than the MH method in the entire analysis groups.


2011 ◽  
Vol 35 (8) ◽  
pp. 604-622 ◽  
Author(s):  
Hirotaka Fukuhara ◽  
Akihito Kamata

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into account, thus estimating DIF magnitude appropriately when a test is composed of testlets. A fully Bayesian estimation method was adopted for parameter estimation. The recovery of parameters was evaluated for the proposed DIF model. Simulation results revealed that the proposed bifactor MIRT DIF model produced better estimates of DIF magnitude and higher DIF detection rates than the traditional IRT DIF model for all simulation conditions. A real data analysis was also conducted by applying the proposed DIF model to a statewide reading assessment data set.


Psych ◽  
2020 ◽  
Vol 2 (1) ◽  
pp. 44-51
Author(s):  
Vladimir Shibaev ◽  
Andrei Grigoriev ◽  
Ekaterina Valueva ◽  
Anatoly Karlin

National IQ estimates are based on psychometric measurements carried out in a variety of cultural contexts and are often obtained from Raven’s Progressive Matrices tests. In a series of studies, J. Philippe Rushton et al. have argued that these tests are not biased with respect to ethnicity or race. Critics claimed their methods were inappropriate and suggested differential item functioning (DIF) analysis as a more suitable alternative. In the present study, we conduct a DIF analysis on Raven’s Standard Progressive Matrices Plus (SPM+) tests administered to convenience samples of Yakuts and ethnic Russians. The Yakuts scored lower than the Russians by 4.8 IQ points, a difference that can be attributed to the selectiveness of the Russian sample. Data from the Yakut (n = 518) and Russian (n = 956) samples were analyzed for DIF using logistic regression. Although items B9, B10, B11, B12, and C11 were identified as having uniform DIF, all of these DIF effects can be regarded as negligible (R2 <0.13). This is consistent with Rushton et al.’s arguments that the Raven’s Progressive Matrices tests are ethnically unbiased.


Sign in / Sign up

Export Citation Format

Share Document