scholarly journals Examining the Impact of Differential Item Functioning on Classification Accuracy in Cognitive Diagnostic Models

2019 ◽  
Vol 44 (4) ◽  
pp. 267-281 ◽  
Author(s):  
Justin Paulsen ◽  
Dubravka Svetina ◽  
Yanan Feng ◽  
Montserrat Valdivia

Cognitive diagnostic models (CDMs) are of growing interest in educational research because of the models’ ability to provide diagnostic information regarding examinees’ strengths and weaknesses suited to a variety of content areas. An important step to ensure appropriate uses and interpretations from CDMs is to understand the impact of differential item functioning (DIF). While methods of detecting DIF in CDMs have been identified, there is a limited understanding of the extent to which DIF affects classification accuracy. This simulation study provides a reference to practitioners to understand how different magnitudes and types of DIF interact with CDM item types and group distributions and sample sizes to influence attribute- and profile-level classification accuracy. The results suggest that attribute-level classification accuracy is robust to DIF of large magnitudes in most conditions, while profile-level classification accuracy is negatively influenced by the inclusion of DIF. Conditions of unequal group distributions and DIF located on simple structure items had the greatest effect in decreasing classification accuracy. The article closes by considering implications of the results and future directions.

2009 ◽  
Vol 15 (5) ◽  
pp. 758-768 ◽  
Author(s):  
OTTO PEDRAZA ◽  
NEILL R. GRAFF-RADFORD ◽  
GLENN E. SMITH ◽  
ROBERT J. IVNIK ◽  
FLOYD B. WILLIS ◽  
...  

AbstractScores on the Boston Naming Test (BNT) are frequently lower for African American when compared with Caucasian adults. Although demographically based norms can mitigate the impact of this discrepancy on the likelihood of erroneous diagnostic impressions, a growing consensus suggests that group norms do not sufficiently address or advance our understanding of the underlying psychometric and sociocultural factors that lead to between-group score discrepancies. Using item response theory and methods to detect differential item functioning (DIF), the current investigation moves beyond comparisons of the summed total score to examine whether the conditional probability of responding correctly to individual BNT items differs between African American and Caucasian adults. Participants included 670 adults age 52 and older who took part in Mayo’s Older Americans and Older African Americans Normative Studies. Under a two-parameter logistic item response theory framework and after correction for the false discovery rate, 12 items where shown to demonstrate DIF. Of these 12 items, 6 (“dominoes,” “escalator,” “muzzle,” “latch,” “tripod,” and “palette”) were also identified in additional analyses using hierarchical logistic regression models and represent the strongest evidence for race/ethnicity-based DIF. These findings afford a finer characterization of the psychometric properties of the BNT and expand our understanding of between-group performance. (JINS, 2009, 15, 758–768.)


2014 ◽  
Vol 27 (7) ◽  
pp. 1099-1111 ◽  
Author(s):  
Anne E. Mueller ◽  
Daniel L. Segal ◽  
Brandon Gavett ◽  
Meghan A. Marty ◽  
Brian Yochim ◽  
...  

ABSTRACTBackground:The Geriatric Anxiety Scale (GAS; Segal et al. (Segal, D. L., June, A., Payne, M., Coolidge, F. L. and Yochim, B. (2010). Journal of Anxiety Disorders, 24, 709–714. doi:10.1016/j.janxdis.2010.05.002) is a self-report measure of anxiety that was designed to address unique issues associated with anxiety assessment in older adults. This study is the first to use item response theory (IRT) to examine the psychometric properties of a measure of anxiety in older adults.Method:A large sample of older adults (n = 581; mean age = 72.32 years, SD = 7.64 years, range = 60 to 96 years; 64% women; 88% European American) completed the GAS. IRT properties were examined. The presence of differential item functioning (DIF) or measurement bias by age and sex was assessed, and a ten-item short form of the GAS (called the GAS-10) was created.Results:All GAS items had discrimination parameters of 1.07 or greater. Items from the somatic subscale tended to have lower discrimination parameters than items on the cognitive or affective subscales. Two items were flagged for DIF, but the impact of the DIF was negligible. Women scored significantly higher than men on the GAS and its subscales. Participants in the young-old group (60 to 79 years old) scored significantly higher on the cognitive subscale than participants in the old-old group (80 years old and older).Conclusions:Results from the IRT analyses indicated that the GAS and GAS-10 have strong psychometric properties among older adults. We conclude by discussing implications and future research directions.


2016 ◽  
Vol 2016 ◽  
pp. 1-8 ◽  
Author(s):  
Elahe Allahyari ◽  
Peyman Jafari ◽  
Zahra Bagheri

Objective.The present study uses simulated data to find what the optimal number of response categories is to achieve adequate power in ordinal logistic regression (OLR) model for differential item functioning (DIF) analysis in psychometric research.Methods.A hypothetical ten-item quality of life scale with three, four, and five response categories was simulated. The power and type I error rates of OLR model for detecting uniform DIF were investigated under different combinations of ability distribution (θ), sample size, sample size ratio, and the magnitude of uniform DIF across reference and focal groups.Results.Whenθwas distributed identically in the reference and focal groups, increasing the number of response categories from 3 to 5 resulted in an increase of approximately 8% in power of OLR model for detecting uniform DIF. The power of OLR was less than 0.36 when ability distribution in the reference and focal groups was highly skewed to the left and right, respectively.Conclusions.The clearest conclusion from this research is that the minimum number of response categories for DIF analysis using OLR is five. However, the impact of the number of response categories in detecting DIF was lower than might be expected.


2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. 447-447
Author(s):  
Nadia Chu ◽  
Alden Gross ◽  
Xiaomeng Chen ◽  
Qian-Li Xue ◽  
Karen Bandeen-Roche ◽  
...  

Abstract Frailty is commonly measured for clinical risk stratification during transplant evaluation and is more prevalent among older, non-White kidney transplant (KT) patients. However, group differences may be partially attributable to misclassification resulting from measurement bias (differential item functioning/DIF). We examined the extent that DIF affects estimates of age, sex, and race differences in frailty (physical frailty phenotype/PFP) prevalence among 4,300 candidates and 1,396 recipients. We used Multiple Indicators Multiple Causes with dichotomous indicators to assess uniform DIF in PFP criteria attributable to age (≥65vs.18-64 years), sex, and race (Black vs.White). Among candidates (mean age=55 years), 41% were female, 46% were Black, and 19% were frail. After controlling for mean frailty level, females were more likely to endorse exhaustion (OR=1.20,p=0.003), but less likely to endorse low activity (OR=0.83,p=0.01). Younger candidates were more likely to endorse weight loss (OR=1.30,p=0.005), exhaustion (OR=1.60,p<0.001), and low activity (OR=1.80,p<0.001). Black candidates were more likely to endorse exhaustion (OR=1.25,p<0.001), but less likely to endorse weakness (OR=0.79,p<0.001). Among recipients (mean age=54 years), 40% were female, 39% were Black, and 15% were frail. Younger recipients were more likely to endorse weight loss (OR=1.55,p=0.005) and low activity (OR=1.61,p=0.02); however, no DIF was detected by sex or race. Results highlight the impact of DIF for specific PFP measures by age, sex, and race among candidates, but only by age for recipients. Further research is needed to ascertain whether candidate- and/or recipient-specific thresholds to correct for DIF could improve risk prediction and equitable access to KT for older, female, and Black candidates.


2018 ◽  
Vol 43 (4) ◽  
pp. 255-271 ◽  
Author(s):  
Dongbo Tu ◽  
Shiyu Wang ◽  
Yan Cai ◽  
Jeff Douglas ◽  
Hua-Hua Chang

Attribute hierarchy is a common assumption in the educational context, where the mastery of one attribute is assumed to be a prerequisite to the mastery of another one. The attribute hierarchy can be incorporated through a restricted Q matrix that implies the specified structure. The latent class–based cognitive diagnostic models (CDMs) usually do not assume a hierarchical structure among attributes, which means all profiles of attributes are possible in a population of interest. This study investigates different estimation methods to the classification accuracy for a family of CDMs when they are combined with a restricted Q-matrix design. A simulation study is used to explain the misclassification caused by an unrestricted estimation procedure. The advantages of the restricted estimation procedure utilizing attribute hierarchies for increased classification accuracy are also further illustrated through a real data analysis on a syllogistic reasoning diagnostic assessment. This research can provide guidelines for educational and psychological researchers and practitioners when they use CDMs to analyze the data with a restricted Q-matrix design and make them be aware of the potentially contaminated classification results if ignoring attribute hierarchies.


Sign in / Sign up

Export Citation Format

Share Document