differential item functioning
Recently Published Documents


TOTAL DOCUMENTS

1077
(FIVE YEARS 248)

H-INDEX

53
(FIVE YEARS 5)

2022 ◽  
pp. 001316442110684
Author(s):  
Natalie A. Koziol ◽  
J. Marc Goodrich ◽  
HyeonJin Yoon

Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A simulation study was performed to compare the new framework with traditional logistic regression, with respect to Type I error and power rates of the uniform DIF test statistics and bias and root mean square error of the corresponding effect size estimators. The new framework better controlled the Type I error rate and demonstrated minimal bias but suffered from low power and lack of precision. Implications for practice are discussed.


PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0261865
Author(s):  
Linda J. Resnik ◽  
Mathew L. Borgia ◽  
Melissa A. Clark ◽  
Emily Graczyk ◽  
Jacob Segil ◽  
...  

Recent advances in upper limb prosthetics include sensory restoration techniques and osseointegration technology that introduce additional risks, higher costs, and longer periods of rehabilitation. To inform regulatory and clinical decision making, validated patient reported outcome measures are required to understand the relative benefits of these interventions. The Patient Experience Measure (PEM) was developed to quantify psychosocial outcomes for research studies on sensory-enabled upper limb prostheses. While the PEM was responsive to changes in prosthesis experience in prior studies, its psychometric properties had not been assessed. Here, the PEM was examined for structural validity and reliability across a large sample of people with upper limb loss (n = 677). The PEM was modified and tested in three phases: initial refinement and cognitive testing, pilot testing, and field testing. Exploratory factor analysis (EFA) was used to discover the underlying factor structure of the PEM items and confirmatory factor analysis (CFA) verified the structure. Rasch partial credit modeling evaluated monotonicity, fit, and magnitude of differential item functioning by age, sex, and prosthesis use for all scales. EFA resulted in a seven-factor solution that was reduced to the following six scales after CFA: social interaction, self-efficacy, embodiment, intuitiveness, wellbeing, and self-consciousness. After removal of two items during Rasch analyses, the overall model fit was acceptable (CFI = 0.973, TLI = 0.979, RMSEA = 0.038). The social interaction, self-efficacy and embodiment scales had strong person reliability (0.81, 0.80 and 0.77), Cronbach’s alpha (0.90, 0.80 and 0.71), and intraclass correlation coefficients (0.82, 0.85 and 0.74), respectively. The large sample size and use of contemporary measurement methods enabled identification of unidimensional constructs, differential item functioning by participant characteristics, and the rank ordering of the difficulty of each item in the scales. The PEM enables quantification of critical psychosocial impacts of advanced prosthetic technologies and provides a rigorous foundation for future studies of clinical and prosthetic interventions.


Crisis ◽  
2021 ◽  
Author(s):  
Jenny Mei Yiu Huen ◽  
Paul Siu Fai Yip ◽  
Augustine Osman ◽  
Angel Nga Man Leung

Abstract. Background: Despite the widespread use of the Suicidal Behaviors Questionnaire–Revised (SBQ-R) and advances in item response theory (IRT) modeling, item-level analysis with the SBQ-R has been minimal. Aims: This study extended IRT modeling strategies to examine the response parameters and potential differential item functioning (DIF) of the individual SBQ-R items in samples of US ( N = 320) and Chinese ( N = 298) undergraduate students. Method: Responses to the items were calibrated using the unidimensional graded response IRT model. Goodness-of-fit, item parameters, and DIF were evaluated. Results: The unidimensional graded response IRT model provided a good fit to the sample data. Results showed that the SBQ-R items had various item discrimination parameters and item severity parameters. Also, each SBQ-R item functioned similarly between the US and Chinese respondents. In particular, Item 1 (history of attempts) demonstrated high discrimination and severity of suicide-related thoughts and behaviors (STBs). Limitations: The use of cross-sectional data from convenience samples of undergraduate students could be considered a major limitation. Conclusion: The findings from the IRT analysis provided empirical support that each SBQ-R item taps into STBs and that scores for Item 1 can be used for screening purposes.


2021 ◽  
Vol Publish Ahead of Print ◽  
Author(s):  
Jonathan D. Rubright ◽  
Michael Jodoin ◽  
Stephanie Woodward ◽  
Michael A. Barone

2021 ◽  
Author(s):  
Luke R Aldridge ◽  
Christopher G Kemp ◽  
Judith K Bass ◽  
Kristen Danforth ◽  
Jeremy C Kane ◽  
...  

Abstract BackgroundExisting implementation measures developed in high-income countries may have limited appropriateness for use within low- and middle-income countries (LMIC). In response, researchers at Johns Hopkins University began developing the Mental Health Implementation Science Tools (mhIST) in 2013 to assess priority implementation determinants and outcomes across four key stakeholder groups – consumers, providers, organization leaders, and policy makers – with dedicated versions of scales for each group. These were field tested and refined in several contexts, and criterion validity was established in Ukraine. The Consumer and Provider mhIST have since grown in popularity in mental health research, outpacing psychometric evaluation. Our objective was to establish the cross-context psychometric properties of these versions and inform future revisions.MethodsWe compiled data from seven studies across six LMIC – Colombia, Myanmar, Pakistan, Thailand, Ukraine, and Zambia – to evaluate the psychometric performance of the Consumer and Provider mhIST. We used exploratory factor analysis to identify dimensionality, factor structure, and item loadings for each scale within each stakeholder version. We also used alignment analysis (i.e., multi-group confirmatory factor analysis) to estimate measurement invariance and differential item functioning of the Consumer scales across the six countries.FindingsAll but one scale within the Provider and Consumer versions had a Cronbach’s alpha greater than 0.8. Exploratory factor analysis indicated most scales were multidimensional, with factors generally aligning with a priori subscales for the Provider version; the Consumer version has no predefined subscales. Alignment analysis of the Consumer mhIST indicated a range of measurement invariance for scales across settings (R2 0.46 to 0.77). Several items were identified for potential revision due to participant non-response or low or cross- factor loadings. We found only one item – which asked consumers whether their intervention provider was available when needed – to have differential item functioning in both intercept and loading.ConclusionWe provide evidence that the Consumer and Provider versions of the mhIST are internally valid and reliable across diverse contexts and stakeholder groups for mental health research in LMIC. We recommend the instrument be revised based on these analyses and future research examine instrument utility by linking measurement to other outcomes of interest.


2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. 447-447
Author(s):  
Nadia Chu ◽  
Alden Gross ◽  
Xiaomeng Chen ◽  
Qian-Li Xue ◽  
Karen Bandeen-Roche ◽  
...  

Abstract Frailty is commonly measured for clinical risk stratification during transplant evaluation and is more prevalent among older, non-White kidney transplant (KT) patients. However, group differences may be partially attributable to misclassification resulting from measurement bias (differential item functioning/DIF). We examined the extent that DIF affects estimates of age, sex, and race differences in frailty (physical frailty phenotype/PFP) prevalence among 4,300 candidates and 1,396 recipients. We used Multiple Indicators Multiple Causes with dichotomous indicators to assess uniform DIF in PFP criteria attributable to age (≥65vs.18-64 years), sex, and race (Black vs.White). Among candidates (mean age=55 years), 41% were female, 46% were Black, and 19% were frail. After controlling for mean frailty level, females were more likely to endorse exhaustion (OR=1.20,p=0.003), but less likely to endorse low activity (OR=0.83,p=0.01). Younger candidates were more likely to endorse weight loss (OR=1.30,p=0.005), exhaustion (OR=1.60,p<0.001), and low activity (OR=1.80,p<0.001). Black candidates were more likely to endorse exhaustion (OR=1.25,p<0.001), but less likely to endorse weakness (OR=0.79,p<0.001). Among recipients (mean age=54 years), 40% were female, 39% were Black, and 15% were frail. Younger recipients were more likely to endorse weight loss (OR=1.55,p=0.005) and low activity (OR=1.61,p=0.02); however, no DIF was detected by sex or race. Results highlight the impact of DIF for specific PFP measures by age, sex, and race among candidates, but only by age for recipients. Further research is needed to ascertain whether candidate- and/or recipient-specific thresholds to correct for DIF could improve risk prediction and equitable access to KT for older, female, and Black candidates.


2021 ◽  
pp. 026921552110621
Author(s):  
Antonio Caronni ◽  
Michela Picardi ◽  
Valentina Redaelli ◽  
Paola Antoniotti ◽  
Giuseppe Pintavalle ◽  
...  

Objective To test with the Rasch analysis the psychometric properties of the Falls Efficacy Scale International, a questionnaire for measuring concern about falling. Design Longitudinal observational study, before–after rehabilitation. Setting Inpatient rehabilitation. Subjects A total of 251 neurological patients with balance impairment. Interventions Physiotherapy and occupational therapy aimed at reducing the risk of falling. Main measures Participants (median age, first–third quartile: 74.0, 65.5–80.5 years; stroke and polyneuropathy: 43% and 21% of the sample, respectively) received a balance assessment (Falls Efficacy Scale International included) pre- and post-rehabilitation. Rasch analysis was used to evaluate the Falls Efficacy Scale International. Differential item functioning, which assesses the measures’ stability in different conditions (e.g. before vs. after treatment) and in different groups of individuals, was tested for several variables. Results Patients suffered a moderate balance impairment (Mini-BESTest median score; first–third quartile: 15; 11–19), mild–moderate concern about falling (Falls Efficacy Scale International: 28; 21–37) and motor disability (Functional Independence Measure, motor domain: 70.0; 57.0–76.5). Falls Efficacy Scale International items fitted the Rasch model (range of infit and outfit mean square statistics: 0.8–1.32 and 0.71–1.45, respectively) and the questionnaire's reliability was satisfactory (0.87). No differential item functioning was found for treatment, gender, age and balance impairment. Differential item functioning was found for diagnosis and disability severity, but it is shown that it is not such as to bias measures. Conclusions Falls Efficacy Scale International ordinal scores can be turned into interval measures, i.e. measures of the type of temperature. Being differential item functioning-free for treatment, these measures can be safely used to compare concern about falling before and after rehabilitation, such as when interested in assessing the rehabilitation effectiveness.


2021 ◽  
Vol 12 ◽  
Author(s):  
Linyu Liao ◽  
Don Yao

Differential Item Functioning (DIF) analysis is always an indispensable methodology for detecting item and test bias in the arena of language testing. This study investigated grade-related DIF in the General English Proficiency Test-Kids (GEPT-Kids) listening section. Quantitative data were test scores collected from 791 test takers (Grade 5 = 398; Grade 6 = 393) from eight Chinese-speaking cities, and qualitative data were expert judgments collected from two primary school English teachers in Guangdong province. Two R packages “difR” and “difNLR” were used to perform five types of DIF analysis (two-parameter item response theory [2PL IRT] based Lord’s chi-square and Raju’s area tests, Mantel-Haenszel [MH], logistic regression [LR], and nonlinear regression [NLR] DIF methods) on the test scores, which altogether identified 16 DIF items. ShinyItemAnalysis package was employed to draw item characteristic curves (ICCs) for the 16 items in RStudio, which presented four different types of DIF effect. Besides, two experts identified reasons or sources for the DIF effect of four items. The study, therefore, may shed some light on the sustainable development of test fairness in the field of language testing: methodologically, a mixed-methods sequential explanatory design was adopted to guide further test fairness research using flexible methods to achieve research purposes; practically, the result indicates that DIF analysis does not necessarily imply bias. Instead, it only serves as an alarm that calls test developers’ attention to further examine the appropriateness of test items.


Sign in / Sign up

Export Citation Format

Share Document