scholarly journals Examining Appropriacy of CFI/TLI Cutoff Value in Multiple-group CFA Test of Measurement Invariance to Enhance Accuracy of Test Score Interpretation

2021 ◽  
Author(s):  
Abdolvahab Khademi ◽  
Craig S Wells ◽  
Maria Elena Oliveri

The most common effect size when using a multiple-group confirmatory factor analysis approach to measurement invariance is ΔCFI/TLI with a cutoff value of 0.01 (Cheung & Rensvold, 2002). However, this recommended cutoff value may not be ubiquitously appropriate and may be of limited utility for some tests (e.g., measures using dichotomous items or different estimation methods, sample sizes, or model complexity). Moreover, prior cutoff value estimations often have ignored consequences resulting in using measures that more accurately estimate countries’ or learners’ proficiency for some countries or groups versus others. In this study, we investigate whether the cutoff value proposed by Cheung and Rensvold (ΔCFI/TLI=0.01) is appropriate across educational measurement contexts. Specifically, we investigated the performance of ΔCFI/TLI in capturing LOI at the scalar level in dichotomous items within item response theory on groups whose test characteristic curves differed by 0.5. Simulation results showed that the proposed cutoff value of 0.01 in ΔCFI/TLI was not appropriate to capture LOI under the study conditions, which may result in the misinterpretation of test results or inaccurate inferences.

2021 ◽  
pp. 003329412110360
Author(s):  
Qingsong Tan ◽  
Jilin Zou ◽  
Feng Kong

The 5-item Gratitude Questionnaire (GQ-5) is one of the most commonly used instruments to measure dispositional gratitude in adolescents. The purpose of this study was to verify the longitudinal measurement invariance (LMI) and gender measurement invariance (GMI) of the GQ-5 that was administered to an adolescent sample twice over the course of 18 months ( N = 669). Single-group confirmatory factor analysis (CFA) was adopted to examine the LMI and multiple-group CFA was conducted to assess the GMI. The results showed that the GQ-5 had strong invariance (i.e., equality of factor patterns, loadings, and intercepts) across time and gender. Validation of latent factor mean differences showed that females had higher gratitude scores than males. In addition, the GQ-5 exhibited good internal consistency indices across time and a moderate stability coefficient was also found across an 18-month time interval in adolescents. In summary, our study showed that LMI and GMI of the GQ-5 are satisfactory and the GQ-5 is a reliable instrument for measuring gratitude in adolescents.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Daniel Y. T. Fong ◽  
Janet Y. H. Wong ◽  
Edmond P. H. Choi ◽  
K. F. Lam ◽  
C. Kwok

Abstract Background The Short Form 12-item Health Survey (SF-12v2) was originally developed in English, but it is also available in Hong Kong (HK) Chinese. While both language versions had their measurement properties well assessed in their respective populations, their measurement invariance in scores has not been examined. Therefore, we aimed to assess their measurement invariance. Methods We conducted a cross-sectional study on individuals aged 18 years or older at a university campus. Those who were bilingual in English and Chinese were randomly assigned to self-complete either the standard English or the HK Chinese SF-12v2. Measurement invariance of the two components and eight scales of the SF-12v2 was concluded if the corresponding 90% confidence interval (CI) for the difference between the two language versions entirely fell within the minimal clinically important difference of ± 3 units. Multiple-group confirmatory factor analysis (CFA) was also performed. Results A total of 1013 participants completed the SF-12v2 (496 in English and 517 in HK Chinese), with a mean age of 22 years (Range 18–58), and 626 participants (62%) were female. There were no significant differences in demographics. Only the physical and mental components and the mental health (MH) scale had their 90% CIs (0.21 to 1.61, − 1.00 to 0.98, and − 0.86 to 2.84, respectively) completely fall within the ± 3 units. The multiple-group CFA showed partial strict invariance. Conclusions The English and HK Chinese versions of the SF-12v2 can be used in studies with their two components and MH scores pooled in the analysis.


2021 ◽  
Vol 10 (4) ◽  
pp. 2121-2131
Author(s):  
Mustafa Ali ◽  
Mohammed A.

<p style="text-align: justify;">The academic buoyancy scale (ABS) is one of the most widely used instruments for measuring academic buoyancy. To obtain meaningful and valid comparisons across groups using ABS, however, measurement invariance should be ascertained a priori. To that end, we examined its measurement invariance, validity evidence based on relations to other variables, and score reliability using categorical omega across culture and gender among Egyptian and Omani undergraduates. Participants were 345 college students: Egyptian sample (N=191) and Omani sample (N=154). To assess measurement invariance across culture and gender, multiple–group confirmatory factor analysis was performed with four successive invariance models: (a) configural, (b) metric, (c) scalar, and (d) residual. Results revealed that the unidimensional baseline model had adequate fit to the data in the full sample. Moreover, measurement invariance was found to hold across culture but not across gender and consequently the ABS could be used to yield valid cross-cultural comparisons between the Egyptian and Omani students. Conversely, it cannot be used to yield valid inferences related to comparing gender groups within each culture. Validity evidence based on relations to other variables was supported by the significantly moderate correlation between ABS and academic achievement (GPA; r =.435 and r = .457, P < .01) for the Egyptian and Omani samples, respectively. With regard to score reliability, categorical omega coefficients were moderate across both samples. Educational and psychological implications, limitations and suggestions for improving the scale are discussed.</p>


2008 ◽  
Vol 24 (2) ◽  
pp. 88-94 ◽  
Author(s):  
Bernad Batinic ◽  
Hans-Georg Wolff ◽  
Christiane M. Haupt

This paper reports the development of a short version of the trendsetting questionnaire (TDS; Batinic, Haupt, & Wieselhuber, 2006 ). According to Batinic et al., individuals high on trendsetting keep an eye open for new trends and have a broad interest in innovations. They inform a wide range of others and explain to them the value of innovations, and they recommend specific products to friends and acquaintances. Empirical criteria as well as substantive criteria were used to select nine items representing the three subtypes of the trendsetting model: input, throughput, and output. Dimensionality and measurement invariance of the short version (TDS-K) were examined in two offline surveys (N = 2,001 and 948) and two online surveys (N = 4,450 and 12,087). Multiple group confirmatory factor analyses supported the unidimensionality of the measure and showed that measurement invariance held within each of the administration methods (offline vs. online survey), but only partial invariance held across these methods.


Author(s):  
Diana Rivera-Ottenberger ◽  
Mónica Guzmán-González ◽  
Carlos Calderón ◽  
Sagrario Yárnoz-Yaben ◽  
Priscila Comino

(1) Background: Current research on the factors involved in the adaptation process to divorce or separation has explored cross-cultural differences. An initial step in the cross-cultural field is to investigate whether the measurements applied are comparable in different cultural contexts. The aim of the present study is to test the measurement invariance of the Questionnaire of Forgiveness in Divorce-Separation (CPD-S); (2) Methods: The CPD-S was completed by 556 (M = 44.52, SD = 10.18) and 240 (M = 41.44, SD = 7.87) Chilean and Spanish divorced individuals, respectively. Confirmatory factor analyses in single samples and measurement invariance testing in a multi-group framework were conducted to test the cross-group equivalence; (3) Results: The single-factor structure of the CPD-S was supported in both countries. Measurement invariance analysis demonstrated that the CPD-S had partial scalar measurement invariance; (4) Conclusions: The evidence supports the conclusion that CPD-S operates similarly across both countries. Findings are discussed from a cross-cultural and methodological perspective.


2021 ◽  
Author(s):  
Abdolvahab Khademi

One desirable property of a measurement process or instrument is the maximum invariance of the results across subpopulations with similar distribution of the traits. Determining measurement invariance (MI) is a statistical procedure in which different methods are used given different factors, such as the nature of the data (e.g. continuous, or discrete, completeness), sample size, measurement framework (e.g. observed scores, latent variable modeling), and other context-specific factors. To evaluate the statistical results, numerical criteria are often used, derived from theory, simulation, or practice. One statistical method to evaluate MI is multiple-group confirmatory factor analysis (MG-CFA) in which the amount of change in fit indices of nested models, such as comparative fit index (CFI), Tucker-Lewis fit index (TLI), and the root mean squared error of approximation (RMSEA), are used to determine if the lack of invariance is non-trivial. Currently, in the MG-CFA framework for establishing MI, the recommended effect size is a change of less than 0.01 in CFI/TLI measures (Cheung &amp; Rensvold, 2002). However, the recommended cutoff value is a very general index and may not be appropriate under some conditions, such as dichotomous indicators, different estimation methods, different sample sizes, and model complexity. In addition, in determining the cutoff value, consequences to the lack of invariance have been ignored in the current research. To address these gaps, the present research undertakes to evaluate the appropriateness of the current effect size of CFI or TLI &lt; 0.01 in educational measurement settings, where the items are dichotomous, the item response functions follow an item response theory (IRT) model, estimation method is robust weighted least squares, and the focal and reference groups differ from each other on the IRT scale by 0.5 units (equivalent to ±1 raw score). A simulation study was performed with five (crossed) factors: percent of differential functioning items, IRT model, IRT a and b parameters, and the sample size. The results of the simulation study showed that the cutoff value of a CFI/TLI &lt; 0.01 for establishing MI is not appropriate for educational settings under the foregoing conditions.


Assessment ◽  
2018 ◽  
Vol 27 (7) ◽  
pp. 1490-1501 ◽  
Author(s):  
Braden K. Tompke ◽  
Jennie Tang ◽  
Irina I. Oltean ◽  
M. Claire Buchan ◽  
Shannon V. Reaume ◽  
...  

Initial evidence suggests that the WHO Disability Assessment Schedule (WHODAS 2.0) is valid and reliable in general youth populations; however, its psychometric properties in specific subgroups are less established. The primary objective was to test for measurement invariance of the 12-item WHODAS 2.0 in an epidemiological sample of youth aged 15 to 19 years with and without physical or mental conditions. Using data from 1,851 youth in the Canadian Community Health Survey–Mental Health, invariance was tested using multiple-group confirmatory factor analysis. Within-domain item correlations were significant and ordinal coefficient alphas were .91, .94, .93, and .92 for the healthy control, physical, mental, and comorbid groups, respectively. Partial measurement invariance was demonstrated for the WHODAS 2.0, with evidence of noninvariance for item residuals and factor variances related to cognition and participation. While these domain-specific comparisons may be biased, valid comparisons of overall disability across subgroups of youth can be made with confidence.


2018 ◽  
Vol 41 (3) ◽  
pp. 393-399 ◽  
Author(s):  
Janna R. Gordon ◽  
Vanessa L. Malcarne ◽  
Scott C. Roesch ◽  
Richard G. Roetzheim ◽  
Kristen J. Wells

The Pearlin Mastery (PM) Scale is frequently used in health research to assess individuals’ personal mastery or the extent to which they believe they are in control of their own lives. It has been adapted from English into multiple languages including Spanish. However, no studies have assessed the psychometric properties of Spanish translations of the scale. This analysis evaluated structural validity and measurement invariance of the original Spanish translation of the PM Scale in two groups of Spanish-speaking individuals receiving primary care at community clinics in Florida. Confirmatory factor analysis (CFA) indicated that the 5-item version used in the literature yields a unidimensional factor structure as expected; however, multiple-group CFA revealed that the PM Scale items did not load equivalently on the factor across samples. This indicates that the Spanish version of the PM Scale may not measure mastery consistently across groups, possibly due to differences in respondents’ semantic understanding of items or differences in the meaning of the construct itself. Findings suggest that researchers seeking to measure personal mastery in Spanish-speaking participants from diverse cultural backgrounds should consider alternative approaches including the development of new instruments.


Sign in / Sign up

Export Citation Format

Share Document