differential test functioning
Recently Published Documents


TOTAL DOCUMENTS

19
(FIVE YEARS 9)

H-INDEX

5
(FIVE YEARS 0)

2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Philseok Lee ◽  
Seang-Hwane Joo

To address faking issues associated with Likert-type personality measures, multidimensional forced-choice (MFC) measures have recently come to light as important components of personnel assessment systems. Despite various efforts to investigate the fake resistance of MFC measures, previous research has mainly focused on the scale mean differences between honest and faking conditions. Given the recent psychometric advancements in MFC measures (e.g., Brown & Maydeu-Olivares, 2011; Stark et al., 2005; Lee et al., 2019; Joo et al., 2019), there is a need to investigate the fake resistance of MFC measures through a new methodological lens. This research investigates the fake resistance of MFC measures through recently proposed differential item functioning (DIF) and differential test functioning (DTF) methodologies for MFC measures (Lee, Joo, & Stark, 2020). Overall, our results show that MFC measures are more fake resistant than Likert-type measures at the item and test levels. However, MFC measures may still be susceptible to faking if MFC measures include many mixed blocks consisting of positively and negatively keyed statements within a block. It may be necessary for future research to find an optimal strategy to design mixed blocks in the MFC measures to satisfy the goals of validity and scoring accuracy. Practical implications and limitations are discussed in the paper.


2021 ◽  
pp. 001316442110015
Author(s):  
Dimiter M. Dimitrov ◽  
Dimitar V. Atanasov

This study offers an approach to testing for differential item functioning (DIF) in a recently developed measurement framework, referred to as D-scoring method (DSM). Under the proposed approach, called P–Z method of testing for DIF, the item response functions of two groups (reference and focal) are compared by transforming their probabilities of correct item response, estimated under the DSM, into Z-scale normal deviates. Using the liner relationship between such Z-deviates, the testing for DIF is reduced to testing two basic statistical hypotheses about equal variances and equal means of the Z-deviates for the reference and focal groups. The results from a simulation study support the efficiency (low Type error and high power) of the proposed P–Z method. Furthermore, it is shown that the P–Z method is directly applicable in testing for differential test functioning. Recommendations for practical use and future research, including possible applications of the P–Z method in IRT context, are also provided.


2020 ◽  
pp. 153450842097683
Author(s):  
Brian Barger ◽  
Emily Graybill ◽  
Andrew Roach ◽  
Kathleen Lane

This study used item response theory (IRT) methods to investigate group differences in responses to the 12-item Student Risk Screening Scale-Internalizing and Externalizing (SRSS-IE12) in a sample of 3,837 elementary school students. Using factor analysis and graded response models from IRT methods, we examined the factor structure, and item and test functioning of the SRSS-IE12. The SRSS-IE12 internalizing and externalizing factors reflected the hypothesized two-factor model. IRT analyses indicated that SRSS-IE12 items and tests measure internalizing and externalizing traits similarly across students from different race, ethnicity, gender, and elementary level (K-2 vs. 3-5) groups. Moreover, the mostly negligible differential item functioning (DIF) and differential test functioning (DTF) observed suggest these scales render equitable trait ratings. Collectively, the results provide further support for the SRSS-IE12 for universal screening in racially diverse elementary schools.


2020 ◽  
pp. 073428292094552
Author(s):  
Maryellen Brunson McClain ◽  
Bryn Harris ◽  
Sarah E. Schwartz ◽  
Megan E. Golson

Although the racial/ethnic demographics in the United States are changing, few studies evaluate the cultural and linguistic responsiveness of commonly used autism spectrum disorder screening and diagnostic assessment measures. The purpose of this study is to evaluate item and test functioning of the Autism Spectrum Rating Scales (ASRS) in a sample of racially/ethnically diverse parents of children (nonclinical) between the ages of 6–18 ( N = 806). This study is a follow-up to a prior publication examining the factor structure of the ASRS among a similar sample. The present study furthers the examination of measurement invariance of the ASRS in racially/ethnically diverse populations by conducting differential item functioning and differential test functioning with a larger sample. Results indicate test-level invariance; however, five items are noninvariant across parent reporters from different racial/ethnic groups. Implications for practice and directions for future research are discussed.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Amin Mousavi ◽  
Zahra Sharafi ◽  
Abdolreza Mahmoudi ◽  
Hadi Raeisi Shahraki

Background. The Oxford Happiness Inventory (OHI) is a self-report tool to measure happiness. A brief review of previous studies on OHI showed the lack of evaluation of OHI fairness/equivalence in measuring happiness among identified groups. Methods. To examine the psychometric properties and measurement invariance of the OHI, responses of 500 university students were analyzed using item response theory and ordinal logistic regression (OLR). Relevant measures of effect size were utilized to interpret the results. Differential test functioning was also evaluated to determine whether there is an overall bias at the test level. Results. OLR analysis detected four items across gender and two items across marital status to function differentially. An assessment of effect sizes implied negligible differences for practical considerations. Conclusions. This study was a significant step towards providing theoretical and practical information regarding the assessment of happiness by presenting adequate evidence regarding the psychometric properties of OHI.


Assessment ◽  
2019 ◽  
pp. 107319111988744
Author(s):  
William F. Goette ◽  
Andrew L. Schmitt ◽  
Janice Nici

Objective: Investigate the equivalence of several psychometric measures between the traditional Halstead Category Test (HCT–Original Version [OV]) and the computer-based Halstead Category Test (HCT–Computerized Version [CV]). Method: Data were from a diagnostically heterogeneous, archival sample of 211 adults administered either the HCT by computer ( n = 105) or cabinet ( n = 106) as part of a neuropsychological evaluation. Groups were matched on gender, race, education, Full Scale Intelligence Quotient, and Global Neuropsychological Deficit Score. Confirmatory factor analysis was used to examine structural equivalence. Score, variability, and reliability equivalency were also examined. Differential item and test functioning under a Rasch model were examined. Results: An identified factor structure from research of the HCT-OV fit the HCT-CV scores adequately: χ2(4) = 8.83, p = .07; root mean square error of approximation = 0.10 [0.00, 0.20]; standardized root mean residual = 0.03; comparative fit index = 0.99. Total scores and variability of subtest scores were not consistently equivalent between the two administration groups. Reliability estimates were, however, similar and adequate for clinical practice: 0.96 for HCT-OV and 0.97 for HCT-CV. About 17% of items showed possible differential item functioning, though just three of these items were statistically significant. Differential test functioning revealed expected total score differences of <1% between versions. Conclusion: The results of this study suggest that the HCT-CV functions similar to the HCT-OV with there being negligible differences in expected total scores between these versions. The HCT-CV demonstrated good psychometric properties, particularly reliability and construct validity consistent with previous literature. Further study is needed to generalize these findings and to further examine the equivalency of validity evidence between versions.


2019 ◽  
Vol 38 (5) ◽  
pp. 627-641
Author(s):  
Beyza Aksu Dunya ◽  
Clark McKown ◽  
Everett Smith

Emotion recognition (ER) involves understanding what others are feeling by interpreting nonverbal behavior, including facial expressions. The purpose of this study is to evaluate the psychometric properties of a web-based social ER assessment designed for children in kindergarten through third grade. Data were collected from two separate samples of children. The first sample included 3,224 children and the second sample included 4,419 children. Data were calibrated using Rasch dichotomous model. Differential item and test functioning were also evaluated across gender and ethnicity. Across both samples, we found consistent item fit, unidimensional item structure, and adequate item targeting. Analyses of differential item functioning (DIF) found six out of 111 items displaying DIF across gender and no items demonstrating DIF across ethnicity. The analyses of person measure calibrations with and without DIF items yielded no evidence of differential test functioning (DTF) across gender and ethnicity groups in both samples.


AERA Open ◽  
2017 ◽  
Vol 3 (1) ◽  
pp. 233285841769299
Author(s):  
Benjamin W. Domingue ◽  
David Lang ◽  
Martha Cuevas ◽  
Melisa Castellanos ◽  
Carolina Lopera ◽  
...  

Technical schools are an integral part of the education system, and yet, little is known about student learning at such institutions. We consider whether assessments of student learning can be jointly administered to both university and technical school students. We examine whether differential test functioning may bias inferences regarding the relative performance of students in quantitative reasoning and critical reading. We apply item response theory models that allow for differences in response behavior as a function of school context. Items show small yet consistent differential functioning in favor of university students, especially for the quantitative reasoning test. These differences are shown to affect inferences regarding effect size differences between the university and technical students (effect sizes can fall by 44% in quantitative reasoning and 24% in critical reading). Differential test functioning influences the rank orderings of institutions by up to roughly 5 percentile points on average.


Sign in / Sign up

Export Citation Format

Share Document