test bias
Recently Published Documents


TOTAL DOCUMENTS

153
(FIVE YEARS 12)

H-INDEX

21
(FIVE YEARS 1)

2021 ◽  
Vol 11 (1) ◽  
pp. 1-11
Author(s):  
Ngoc Nhat Minh Nguyen

This paper aims to explore the relationship between how language teachers perceive test bias and where they are working, how long they have been working, and where they were professionally trained. The data were collected from 19 in-service English teachers from Eastern and Western settings. They completed a questionnaire in which they were asked to respond to test bias stimuli and answer questions related to their teaching background and training. The stimuli contained either of two forms of bias, unfair penalization and offensiveness. Qualitative and quantitative analysis showed teachers were not fully informed of possible forms of test bias and possible ways potential biases unfairly penalize or offend students. They were better able to recognize biases of unfair penalization than offensiveness. Statistical analyses revealed teachers with over 10 years of experience were better able to recognize potential test bias than those with less experience (at 90% confidence level). The findings contribute to the current limited literature on bias in classroom language testing and assessment, leading to implications for bias review in teacher-developed assessments and teacher training.


2021 ◽  
Author(s):  
Jordan Lasker ◽  
Emil O. W. Kirkegaard ◽  
Helmuth Nyborg

There are few empirically derived theories explaining group differences in cognitive ability. Spearman's hypothesis is one such theory which holds that group differences are a function of a given test's relationship to general intelligence, g. Research into this hypothesis has generally been limited to the application of a single method lacking sensitivity, specificity, and the ability to assess test bias: Jensen’s method of correlated vectors. In order to overcome the resulting empirical gap, we applied three different psychometrically sound methods to examine the hypothesis among American blacks and whites in the Vietnam Experience Study (VES) and the National Longitudinal Survey of Youth 1979 (NLSY ‘79). We first used multi-group confirmatory factor analysis to assess bias and evaluate the hypothesis directly; we found that strict factorial invariance was tenable in both samples and either the strong or the weak form of the hypothesis was supported, with 87 and 78% of the group differences attributable to g in the VES and NLSY ’79 respectively. Using item response theory metrics to avoid pass rate confounding, a strong relationship between g loadings and group differences (r = 0.80 and 0.79) was observed. Finally, assessing differential item functioning with item level data revealed that a handful of items functioned differently, but their removal did not affect gap sizes much beyond what would be expected from shortening tests, and assessing the effect this had on scores using an anchoring method, the differential functioning was found to be negligible in size. In aggregate, results supported Spearman's hypothesis but not test bias as an explanation for the cognitive differences between the groups we studied.


Author(s):  
Michele Goyette-Ewing
Keyword(s):  

Author(s):  
Agnieszka Mikołajczyk ◽  
Michał Grochowski ◽  
Arkadiusz Kwasigroch

AbstractThe paper proposes summarized attribution-based post-hoc explanations for the detection and identification of bias in data. A global explanation is proposed, and a step-by-step framework on how to detect and test bias is introduced. Since removing unwanted bias is often a complicated and tremendous task, it is automatically inserted, instead. Then, the bias is evaluated with the proposed counterfactual approach. The obtained results are validated on a sample skin lesion dataset. Using the proposed method, a number of possible bias-causing artifacts are successfully identified and confirmed in dermoscopy images. In particular, it is confirmed that black frames have a strong influence on Convolutional Neural Network’s prediction: 22% of them changed the prediction from benign to malignant.


2020 ◽  
pp. 144-161
Author(s):  
Kenneth S. Shultz ◽  
David J. Whitney ◽  
Michael J. Zickar
Keyword(s):  

Author(s):  
Eveline Wuttke ◽  
Christin Siegfried ◽  
Carmela Aprea

AbstractDue to current trends in society and economy, financial literacy is often considered as an important twenty-first century skill. However, regardless of the postulated relevance, studies suggest that financial illiteracy seems to be a widespread phenomenon in the population of many nations. Some studies also show that some groups perform particularly poorly (e.g. women, persons with migration background and/or low level of education). These differences are often attributed to different individual characteristics such as abilities, dispositions or socialisation patterns. However, available research also suggests that even after controlling for them, a quite large portion of the performance differences between the various groups of test-takers remains unexplained. One explanation for performance gaps in financial literacy might be that differences in test scores could also be evoked by the test instruments itself and may thus, at least in part, be interpreted as testing bias. In this paper, we present a newly developed Situational Judgement Test, which is focused on financial competence. For this test, we examine whether differences between groups are attributable to individual differences or due to a test bias. To analyse a possible test bias, we tested one facet of financial literacy (with three factors: control of one’s financial situation, budgeting and handling of money) related to everyday money management for measuring invariance for different groups. If measuring invariance could be assumed, we analysed group differences by calculating t-tests. Results show that two factors of the test show measurement invariance for all groups considered (gender, migration and educational background, opportunities to learn). Group comparisons are thus possible and potential differences are not due to a test bias. For one factor, we can only assume measurement invariance for the group with/without migration background and with/without opportunities to learn in financial topics. When we look at group differences, we find that in contrast to the findings of many previous studies, the analysis of the mean differences does not show any systematic deficits in financial literacy for specific groups.


Methodology ◽  
2020 ◽  
Vol 16 (3) ◽  
pp. 241-257
Author(s):  
Bruce W. Austin ◽  
Brian F. French

Methods to assess measurement invariance in constructs have received much attention, as invariance is critical for accurate group comparisons. Less attention has been given to the identification and correction of the sources of non-invariance in predictive equations. This work developed correction factors for structural intercept and slope bias in common regression equations to address calls in the literature to revive test bias research. We demonstrated the correction factors in regression analyses within the context of a large international dataset containing 68 countries and regions (groups). A mathematics achievement score was predicted by a math self-efficacy score, which exhibited a lack of invariance across groups. The proposed correction factors significantly corrected structural intercept and slope bias across groups. The impact of the correction factors was greatest for groups with the largest amount of bias. Implications for both practice and methodological extensions are discussed.


2020 ◽  
Vol 10 (1) ◽  
pp. 1-13
Author(s):  
Amir Mahshanian ◽  
Mohammadtaghi Shahnazari

Given the importance of testing, in general, and scoring writing tasks in particular, the negative effect of fatigue on human raters is important to investigate. This study aimed to (1) explore the relationship between fatigue and scoring composition tasks written by upper-intermediate EFL learners; and (2) to investigate the discrepancy of the frequency of comments among EFL raters while scoring composition tasks. Four raters were selected, and each given 28 composition tasks to score and comment on. The data were analyzed through SPSS software by running ANOVA, Pearson correlation coefficient, and post-hoc tests. Results suggested that the scores assigned to the first 16 tasks were significantly lower than those assigned to the last 12 tasks and that the last four tasks were scored highest. Based on the results obtained from the questionnaire, the observed diversity is argued to be rooted in raters’ fatigue and result in test bias. Furthermore, findings indicated that the frequency of comments given by the raters on the first 12 essays was significantly higher than those on the last 16 essays (the highest and the lowest frequency of comments were observed in the first four, and the last four scored essays, respectively).


Sign in / Sign up

Export Citation Format

Share Document