Methodological Issues in Using Structural Equation Models for Testing Differential Item Functioning

2018 ◽  
pp. 65-94 ◽  
Author(s):  
Jaehoon Lee ◽  
Todd D. Little ◽  
Kristopher J. Preacher
2019 ◽  
Vol 35 (6) ◽  
pp. 823-833 ◽  
Author(s):  
Desiree Thielemann ◽  
Felicitas Richter ◽  
Bernd Strauss ◽  
Elmar Braehler ◽  
Uwe Altmann ◽  
...  

Abstract. Most instruments for the assessment of disordered eating were developed and validated in young female samples. However, they are often used in heterogeneous general population samples. Therefore, brief instruments of disordered eating should assess the severity of disordered eating equally well between individuals with different gender, age, body mass index (BMI), and socioeconomic status (SES). Differential item functioning (DIF) of two brief instruments of disordered eating (SCOFF, Eating Attitudes Test [EAT-8]) was modeled in a representative sample of the German population ( N = 2,527) using a multigroup item response theory (IRT) and a multiple-indicator multiple-cause (MIMIC) structural equation model (SEM) approach. No DIF by age was found in both questionnaires. Three items of the EAT-8 showed DIF across gender, indicating that females are more likely to agree than males, given the same severity of disordered eating. One item of the EAT-8 revealed slight DIF by BMI. DIF with respect to the SCOFF seemed to be negligible. Both questionnaires are equally fair across people with different age and SES. The DIF by gender that we found with respect to the EAT-8 as screening instrument may be also reflected in the use of different cutoff values for men and women. In general, both brief instruments assessing disordered eating revealed their strengths and limitations concerning test fairness for different groups.


Assessment ◽  
2017 ◽  
Vol 26 (6) ◽  
pp. 1001-1013 ◽  
Author(s):  
David C. Cicero ◽  
Elizabeth A. Martin ◽  
Alexander Krieg

The Wisconsin Schizotypy Scales, including their brief versions, are among the most commonly used self-report measures of schizotypy. Although they have been used extensively in many ethnic groups, few studies have examined their differential item functioning (DIF) across groups. The current study included 1,056 Asian, 408 White, 476 Multiethnic, and 372 Hispanic undergraduates. Unidimensional models of the brief Magical Ideation Scale and Perceptual Aberration Scales fit the data well. For both scales, global tests of measurement invariance provided mixed evidence, but few of the items displayed DIF across ethnicities or between sexes within a multiple indicator multiple causes model. For the full versions of the scales and the brief Revised Social Anhedonia Scale, multiple indicator multiple causes models within an exploratory structural equation modeling framework found that few of the items had DIF. These findings suggest that some of the items may have different psychometric properties across groups, but most items do not.


2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Theresa Rohm ◽  
Claus H. Carstensen ◽  
Luise Fischer ◽  
Timo Gnambs

Abstract Background After elementary school, students in Germany are separated into different school tracks (i.e., school types) with the aim of creating homogeneous student groups in secondary school. Consequently, the development of students’ reading achievement diverges across school types. Findings on this achievement gap have been criticized as depending on the quality of the administered measure. Therefore, the present study examined to what degree differential item functioning affects estimates of the achievement gap in reading competence. Methods Using data from the German National Educational Panel Study, reading competence was investigated across three timepoints during secondary school: in grades 5, 7, and 9 (N = 7276). First, using the invariance alignment method, measurement invariance across school types was tested. Then, multilevel structural equation models were used to examine whether a lack of measurement invariance between school types affected the results regarding reading development. Results Our analyses revealed some measurement non-invariant items that did not alter the patterns of competence development found among school types in the longitudinal modeling approach. However, misleading conclusions about the development of reading competence in different school types emerged when the hierarchical data structure (i.e., students being nested in schools) was not taken into account. Conclusions We assessed the relevance of measurement invariance and accounting for clustering in the context of longitudinal competence measurement. Even though differential item functioning between school types was found for each measurement occasion, taking these differences in item estimates into account did not alter the parallel pattern of reading competence development across German secondary school types. However, ignoring the clustered data structure of students being nested within schools led to an overestimation of the statistical significance of school type effects.


2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. 658-658
Author(s):  
Oliver Schilling ◽  
Anna Lücke ◽  
Martin Katzorreck ◽  
Ute Kunzmann ◽  
Denis Gerstorf

Abstract Gero-psychological research increasingly considered intense longitudinal assessments of momentary affect to address affective aging. In particular, many studies employed negative emotion item lists for ambulatory assessments of negative affect. However, frequent self-reports on emotion items within short time intervals might change alertness towards and perception of one’s emotional experiences. From an item-response-theoretic point of view, this might impair the stability of item functioning in terms of item discrimination between levels of affectivity and item severity (difficulty). Thus, we examined measurement invariance of negative emotion items commonly used for ambulatory assessments of negative affect. Ambulatory assessments from the EMIL study, obtained over seven consecutive days at six occasions per day from 123 young-old (aged 66-69) and 47 old-old (86-89) adults, were analyzed. Respondents self-reported on 13 negative emotion items, using a 0-100 slider to express the degree to which they felt the respective emotion. We ran multilevel structural equation models with Bayes estimation to analyze variability of negative affect factor loadings, item intercepts, and measurement error variances across repeated measures, thus checking for metric, scalar, and strict factorial invariance. For all sets of parameters, the findings do not strongly support measurement invariance, but point at partial invariance for item subsets. Taking on literature suggesting that criteria for invariance testing should not be too restrictive to meet pragmatic measurement equivalence requirements, further analyses and our conclusions focus on strategies that might allow for acceptable degrees of differential item functioning, enabling reliable analyses of intra-individual short-term variability in negative affect.


Diagnostica ◽  
2021 ◽  
Vol 67 (1) ◽  
pp. 13-23
Author(s):  
Ariana Garrote ◽  
Elisabeth Moser Opitz

Zusammenfassung. In dieser Studie wurde der Test MARKO-D (Mathematik- und Rechenkonzepte im Vorschulalter–Diagnose) mit einer Stichprobe von Kindern aus der deutschsprachigen Schweiz ( N = 555) im ersten und zweiten Kindergartenjahr erprobt und es wurde analysiert, ob sich die Altersnormen der deutschen Stichprobe auf die Schweiz übertragen lassen. Zudem wurde der Test mit einer Teilstichprobe ( n = 87) hinsichtlich Messinvarianz über die Zeit untersucht. Die Ergebnisse des eindimensionalen Rasch-Modells zeigen, dass das Instrument für die Schweiz geeignet ist. Die Testleistungen hängen jedoch vom Kindergartenbesuch ab. Für die Schweiz müssten deshalb nebst Altersnormen auch Normen pro Kindergartenhalbjahr verwendet werden. Die Analyse mittels Differential Item Functioning ergab, dass 17 von 55 Items von großer Messvarianz über die Zeit betroffen sind. Um das Instrument für Längsschnittuntersuchungen einsetzen zu können, müsste es weiterentwickelt werden.


2000 ◽  
Vol 16 (1) ◽  
pp. 31-43 ◽  
Author(s):  
Claudio Barbaranelli ◽  
Gian Vittorio Caprara

Summary: The aim of the study is to assess the construct validity of two different measures of the Big Five, matching two “response modes” (phrase-questionnaire and list of adjectives) and two sources of information or raters (self-report and other ratings). Two-hundred subjects, equally divided in males and females, were administered the self-report versions of the Big Five Questionnaire (BFQ) and the Big Five Observer (BFO), a list of bipolar pairs of adjectives ( Caprara, Barbaranelli, & Borgogni, 1993 , 1994 ). Every subject was rated by six acquaintances, then aggregated by means of the same instruments used for the self-report, but worded in a third-person format. The multitrait-multimethod matrix derived from these measures was then analyzed via Structural Equation Models according to the criteria proposed by Widaman (1985) , Marsh (1989) , and Bagozzi (1994) . In particular, four different models were compared. While the global fit indexes of the models were only moderate, convergent and discriminant validities were clearly supported, and method and error variance were moderate or low.


2020 ◽  
Vol 41 (4) ◽  
pp. 207-218
Author(s):  
Mihaela Grigoraș ◽  
Andreea Butucescu ◽  
Amalia Miulescu ◽  
Cristian Opariuc-Dan ◽  
Dragoș Iliescu

Abstract. Given the fact that most of the dark personality measures are developed based on data collected in low-stake settings, the present study addresses the appropriateness of their use in high-stake contexts. Specifically, we examined item- and scale-level differential functioning of the Short Dark Triad (SD3; Paulhus & Jones, 2011 ) measure across testing contexts. The Short Dark Triad was administered to applicant ( N = 457) and non-applicant ( N = 592) samples. Item- and scale-level invariances were tested using an Item Response Theory (IRT)-based approach and a Structural Equation Modeling (SEM) approach, respectively. Results show that more than half of the SD3 items were flagged for Differential Item Functioning (DIF), and Exploratory Structural Equation Modeling (ESEM) results supported configural, but not metric invariance. Implications for theory and practice are discussed.


1995 ◽  
Vol 11 (1) ◽  
pp. 14-20 ◽  
Author(s):  
Sean M. Hammond

This paper presents an IRT analysis of the Beck Depression Inventory which was carried out to assess the assumption of an underlying latent trait common to non-clinical and patient samples. A one parameter rating scale model was fitted to data drawn from a patient and non-patient sample. Findings suggest that while the BDI fits the model reasonably well for the two samples separately there is sufficient differential item functioning to raise serious duobts of the viability of using it analogously with patient and non-patient groups.


2009 ◽  
Vol 14 (4) ◽  
pp. 363-371 ◽  
Author(s):  
Laura Borgogni ◽  
Silvia Dello Russo ◽  
Laura Petitta ◽  
Gary P. Latham

Employees (N = 170) of a City Hall in Italy were administered a questionnaire measuring collective efficacy (CE), perceptions of context (PoC), and organizational commitment (OC). Two facets of collective efficacy were identified, namely group and organizational. Structural equation models revealed that perceptions of top management display a stronger relationship with organizational collective efficacy, whereas employees’ perceptions of their colleagues and their direct superior are related to collective efficacy at the group level. Group collective efficacy had a stronger relationship with affective organizational commitment than did organizational collective efficacy. The theoretical significance of this study is in showing that CE is two-dimensional rather than unidimensional. The practical significance of this finding is that the PoC model provides a framework that public sector managers can use to increase the efficacy of the organization as a whole as well as the individual groups that compose it.


Methodology ◽  
2005 ◽  
Vol 1 (2) ◽  
pp. 81-85 ◽  
Author(s):  
Stefan C. Schmukle ◽  
Jochen Hardt

Abstract. Incremental fit indices (IFIs) are regularly used when assessing the fit of structural equation models. IFIs are based on the comparison of the fit of a target model with that of a null model. For maximum-likelihood estimation, IFIs are usually computed by using the χ2 statistics of the maximum-likelihood fitting function (ML-χ2). However, LISREL recently changed the computation of IFIs. Since version 8.52, IFIs reported by LISREL are based on the χ2 statistics of the reweighted least squares fitting function (RLS-χ2). Although both functions lead to the same maximum-likelihood parameter estimates, the two χ2 statistics reach different values. Because these differences are especially large for null models, IFIs are affected in particular. Consequently, RLS-χ2 based IFIs in combination with conventional cut-off values explored for ML-χ2 based IFIs may lead to a wrong acceptance of models. We demonstrate this point by a confirmatory factor analysis in a sample of 2449 subjects.


Sign in / Sign up

Export Citation Format

Share Document