Detection of Sex Differential Item Functioning in the Cornell Critical Thinking Test

2012 ◽  
Vol 28 (3) ◽  
pp. 201-207 ◽  
Author(s):  
Brian F. French ◽  
Brian Hand ◽  
William J. Therrien ◽  
Juan Antonio Valdivia Vazquez

Critical thinking (CT) can be described as the conscious process a person does to explore a situation or a problem from different perspectives. Accurate measurement of CT skills, especially across subgroups, depends in part on the measurement properties of an instrument being invariant or similar across those groups. The assessment of item-level invariance is a critical component of building a validity argument to ensure that scores on the Cornell Critical Thinking Test (CCTT) have similar meanings across groups. We used logistic regression to examine differential item functioning by sex in the CCTT-Form X. Results suggest that the items function similarly across boys and girls with only 5.6% (4) of items displaying DIF. This implies that any mean differences observed are not a function of a lack of measurement invariance and supports the validity of the inferences drawn when comparing boys and girls on scores on the CCTT.

2015 ◽  
Vol 31 (4) ◽  
pp. 238-246
Author(s):  
Hafize Sahin ◽  
Brian F. French ◽  
Brian Hand ◽  
Murat Gunel

Abstract. Critical thinking is a broad term that includes core elements such as reasoning, evaluating, and metacognition that should be transferred to students in educational systems. The integration of such skills into models of student success is increasing on an international scale. The Cornell Critical Thinking Test is an internationally used tool to assess critical thinking skills. However, limited validity evidence of the translated versions of the instrument exists to support the inferences based on the CCTT scores. This study examined the CCTT Turkish version. Specifically, translated items were examined for measurement equivalence by determining if items function differently across students from United States and Turkey. Differential Item Functioning (DIF) analysis via logistic regression was employed. Results demonstrated that each subtest contained DIF items and 10% of the items in the instrument were identified as DIF. Mean differences between students in each country were not influenced by these items. A critical content review of the translated item gave insight as to why items may be functioning differently.


2021 ◽  
pp. 014662162110428
Author(s):  
Steffi Pohl ◽  
Daniel Schulze ◽  
Eric Stets

When measurement invariance does not hold, researchers aim for partial measurement invariance by identifying anchor items that are assumed to be measurement invariant. In this paper, we build on Bechger and Maris’s approach for identification of anchor items. Instead of identifying differential item functioning (DIF)-free items, they propose to identify different sets of items that are invariant in item parameters within the same item set. We extend their approach by an additional step in order to allow for identification of homogeneously functioning item sets. We evaluate the performance of the extended cluster approach under various conditions and compare its performance to that of previous approaches, that are the equal-mean difficulty (EMD) approach and the iterative forward approach. We show that the EMD and the iterative forward approaches perform well in conditions with balanced DIF or when DIF is small. In conditions with large and unbalanced DIF, they fail to recover the true group mean differences. With appropriate threshold settings, the cluster approach identified a cluster that resulted in unbiased mean difference estimates in all conditions. Compared to previous approaches, the cluster approach allows for a variety of different assumptions as well as for depicting the uncertainty in the results that stem from the choice of the assumption. Using a real data set, we illustrate how the assumptions of the previous approaches may be incorporated in the cluster approach and how the chosen assumption impacts the results.


2013 ◽  
Vol 93 (11) ◽  
pp. 1507-1519 ◽  
Author(s):  
Clayon B. Hamilton ◽  
Bert M. Chesworth

Background The original 20-item Upper Extremity Functional Index (UEFI) has not undergone Rasch validation. Objective The purpose of this study was to determine whether Rasch analysis supports the UEFI as a measure of a single construct (ie, upper extremity function) and whether a Rasch-validated UEFI has adequate reproducibility for individual-level patient evaluation. Design This was a secondary analysis of data from a repeated-measures study designed to evaluate the measurement properties of the UEFI over a 3-week period. Methods Patients (n=239) with musculoskeletal upper extremity disorders were recruited from 17 physical therapy clinics across 4 Canadian provinces. Rasch analysis of the UEFI measurement properties was performed. If the UEFI did not fit the Rasch model, misfitting patients were deleted, items with poor response structure were corrected, and misfitting items and redundant items were deleted. The impact of differential item functioning on the ability estimate of patients was investigated. Results A 15-item modified UEFI was derived to achieve fit to the Rasch model where the total score was supported as a measure of upper extremity function only. The resultant UEFI-15 interval-level scale (0–100, worst to best state) demonstrated excellent internal consistency (person separation index=0.94) and test-retest reliability (intraclass correlation coefficient [2,1]=.95). The minimal detectable change at the 90% confidence interval was 8.1. Limitations Patients who were ambidextrous or bilaterally affected were excluded to allow for the analysis of differential item functioning due to limb involvement and arm dominance. Conclusion Rasch analysis did not support the validity of the 20-item UEFI. However, the UEFI-15 was a valid and reliable interval-level measure of a single dimension: upper extremity function. Rasch analysis supports using the UEFI-15 in physical therapist practice to quantify upper extremity function in patients with musculoskeletal disorders of the upper extremity.


Psych ◽  
2020 ◽  
Vol 2 (1) ◽  
pp. 44-51
Author(s):  
Vladimir Shibaev ◽  
Andrei Grigoriev ◽  
Ekaterina Valueva ◽  
Anatoly Karlin

National IQ estimates are based on psychometric measurements carried out in a variety of cultural contexts and are often obtained from Raven’s Progressive Matrices tests. In a series of studies, J. Philippe Rushton et al. have argued that these tests are not biased with respect to ethnicity or race. Critics claimed their methods were inappropriate and suggested differential item functioning (DIF) analysis as a more suitable alternative. In the present study, we conduct a DIF analysis on Raven’s Standard Progressive Matrices Plus (SPM+) tests administered to convenience samples of Yakuts and ethnic Russians. The Yakuts scored lower than the Russians by 4.8 IQ points, a difference that can be attributed to the selectiveness of the Russian sample. Data from the Yakut (n = 518) and Russian (n = 956) samples were analyzed for DIF using logistic regression. Although items B9, B10, B11, B12, and C11 were identified as having uniform DIF, all of these DIF effects can be regarded as negligible (R2 <0.13). This is consistent with Rushton et al.’s arguments that the Raven’s Progressive Matrices tests are ethnically unbiased.


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Zhongquan Li ◽  
Xia Zhao ◽  
Ang Sheng ◽  
Li Wang

Abstract Background Anxiety symptoms are pervasive among elderly populations around the world. The Geriatric Anxiety Inventory (the GAI) has been developed and widely used in screening those suffering from severe symptoms. Although debates about its dimensionality have been mostly resolved by Molde et al. (2019) with bifactor modeling, evidence regarding its measurement invariance across sex and somatic diseases is still missing. Methods This study attempted to provide complemental evidence to the dimensionality debates of the GAI with Mokken scale analysis and to examine its measurement invariance across sex and somatic diseases by conducting differential item functioning (DIF) analysis among a sample of older Chinese adults. The data was from responses of a large representative sample (N = 1314) in the Chinese National Survey Data Archive, focusing on the mental health of elderly adults. Results The results of Mokken scale analysis confirmed the unidimensionality of the GAI, and DIF analysis indicated measurement invariance of this inventory across individuals with different sex and somatic diseases, with just a few items exhibiting item bias but all of them negligible. Conclusions All these findings supported the use of this inventory among Chinese elders to screen anxiety symptoms and to make comparisons across sex and somatic diseases.


Sign in / Sign up

Export Citation Format

Share Document