Factors Affecting Differential Item Functioning Within the Framework of Cognitive Diagnostic Models: A Comparison of Three Methods

Cognitive diagnostic models (CDMs) are of growing interest in educational research because of the models’ ability to provide diagnostic information regarding examinees’ strengths and weaknesses suited to a variety of content areas. An important step to ensure appropriate uses and interpretations from CDMs is to understand the impact of differential item functioning (DIF). While methods of detecting DIF in CDMs have been identified, there is a limited understanding of the extent to which DIF affects classification accuracy. This simulation study provides a reference to practitioners to understand how different magnitudes and types of DIF interact with CDM item types and group distributions and sample sizes to influence attribute- and profile-level classification accuracy. The results suggest that attribute-level classification accuracy is robust to DIF of large magnitudes in most conditions, while profile-level classification accuracy is negatively influenced by the inclusion of DIF. Conditions of unequal group distributions and DIF located on simple structure items had the greatest effect in decreasing classification accuracy. The article closes by considering implications of the results and future directions.

Download Full-text

A Comparison of Differential Item Functioning Detection Methods in Cognitive Diagnostic Models

Frontiers in Psychology ◽

10.3389/fpsyg.2019.01137 ◽

2019 ◽

Vol 10 ◽

Author(s):

Yanlou Liu ◽

Hao Yin ◽

Tao Xin ◽

Laicheng Shao ◽

Lu Yuan

Keyword(s):

Differential Item Functioning ◽

Detection Methods ◽

Item Functioning ◽

Diagnostic Models ◽

Cognitive Diagnostic Models

Download Full-text

FACTORS AFFECTING DIFFERENTIAL ITEM FUNCTIONING FOR BLACK EXAMINEES ON SCHOLASTIC APTITUDE TEST ANALOGY ITEMS1

ETS Research Report Series ◽

10.1002/j.2330-8516.1987.tb00227.x ◽

1987 ◽

Vol 1987 (1) ◽

pp. i-46 ◽

Cited By ~ 26

Author(s):

Alicia P. Schmitt ◽

Carole A. Bleistein

Keyword(s):

Differential Item Functioning ◽

Scholastic Aptitude Test ◽

Aptitude Test ◽

Factors Affecting ◽

Scholastic Aptitude ◽

Item Functioning

Download Full-text

Erfassung von mathematischen Kompetenzen im Vorschulalter mit MARKO-D

Diagnostica ◽

10.1026/0012-1924/a000258 ◽

2021 ◽

Vol 67 (1) ◽

pp. 13-23

Author(s):

Ariana Garrote ◽

Elisabeth Moser Opitz

Keyword(s):

Differential Item Functioning ◽

Item Functioning

Zusammenfassung. In dieser Studie wurde der Test MARKO-D (Mathematik- und Rechenkonzepte im Vorschulalter–Diagnose) mit einer Stichprobe von Kindern aus der deutschsprachigen Schweiz ( N = 555) im ersten und zweiten Kindergartenjahr erprobt und es wurde analysiert, ob sich die Altersnormen der deutschen Stichprobe auf die Schweiz übertragen lassen. Zudem wurde der Test mit einer Teilstichprobe ( n = 87) hinsichtlich Messinvarianz über die Zeit untersucht. Die Ergebnisse des eindimensionalen Rasch-Modells zeigen, dass das Instrument für die Schweiz geeignet ist. Die Testleistungen hängen jedoch vom Kindergartenbesuch ab. Für die Schweiz müssten deshalb nebst Altersnormen auch Normen pro Kindergartenhalbjahr verwendet werden. Die Analyse mittels Differential Item Functioning ergab, dass 17 von 55 Items von großer Messvarianz über die Zeit betroffen sind. Um das Instrument für Längsschnittuntersuchungen einsetzen zu können, müsste es weiterentwickelt werden.

Download Full-text

Differential Item Functioning in Brief Instruments of Disordered Eating

European Journal of Psychological Assessment ◽

10.1027/1015-5759/a000472 ◽

2019 ◽

Vol 35 (6) ◽

pp. 823-833 ◽

Cited By ~ 4

Author(s):

Desiree Thielemann ◽

Felicitas Richter ◽

Bernd Strauss ◽

Elmar Braehler ◽

Uwe Altmann ◽

...

Keyword(s):

Differential Item Functioning ◽

Disordered Eating ◽

Structural Equation ◽

Young Female ◽

Eating Attitudes ◽

Equation Model ◽

German Population ◽

Test Fairness ◽

Item Functioning ◽

Multiple Indicator

Abstract. Most instruments for the assessment of disordered eating were developed and validated in young female samples. However, they are often used in heterogeneous general population samples. Therefore, brief instruments of disordered eating should assess the severity of disordered eating equally well between individuals with different gender, age, body mass index (BMI), and socioeconomic status (SES). Differential item functioning (DIF) of two brief instruments of disordered eating (SCOFF, Eating Attitudes Test [EAT-8]) was modeled in a representative sample of the German population ( N = 2,527) using a multigroup item response theory (IRT) and a multiple-indicator multiple-cause (MIMIC) structural equation model (SEM) approach. No DIF by age was found in both questionnaires. Three items of the EAT-8 showed DIF across gender, indicating that females are more likely to agree than males, given the same severity of disordered eating. One item of the EAT-8 revealed slight DIF by BMI. DIF with respect to the SCOFF seemed to be negligible. Both questionnaires are equally fair across people with different age and SES. The DIF by gender that we found with respect to the EAT-8 as screening instrument may be also reflected in the use of different cutoff values for men and women. In general, both brief instruments assessing disordered eating revealed their strengths and limitations concerning test fairness for different groups.

Download Full-text

An IRT Investigation of the Validity of Non-Patient Analogue Research Using the Beck Depression Inventory

European Journal of Psychological Assessment ◽

10.1027/1015-5759.11.1.14 ◽

1995 ◽

Vol 11 (1) ◽

pp. 14-20 ◽

Cited By ~ 22

Author(s):

Sean M. Hammond

Keyword(s):

Differential Item Functioning ◽

Beck Depression Inventory ◽

Rating Scale ◽

Latent Trait ◽

Scale Model ◽

Item Functioning ◽

Two Samples ◽

Patient Groups ◽

Analogue Research ◽

Rating Scale Model

This paper presents an IRT analysis of the Beck Depression Inventory which was carried out to assess the assumption of an underlying latent trait common to non-clinical and patient samples. A one parameter rating scale model was fitted to data drawn from a patient and non-patient sample. Findings suggest that while the BDI fits the model reasonably well for the two samples separately there is sufficient differential item functioning to raise serious duobts of the viability of using it analogously with patient and non-patient groups.

Download Full-text