scholarly journals Gender bias in student evaluation of teaching or a mirage?

Author(s):  
Bob Uttl ◽  
Victoria Violo

In a recent small sample study, Khazan et al. [1] examined SET ratings received by one female teaching (TA) assistant who assisted with teaching two sections of the same online course, one section under her true gender and one section under false/opposite gender. Khazan et al. concluded that their study demonstrated gender bias against female TA even though they found no statistical difference in SET ratings between male vs. female TA (p = 0.73). To claim gender bias, Khazan et al. ignored their overall findings and focused on distribution of six “negative” SET ratings and claimed, without reporting any statistical test results, that (a) female students gave more positive ratings to male TA than female TA, (b) female TA received five times as many negative ratings than the male TA, and (c) female students gave “most low” scores to female TA. We conducted the missing statistical tests and found no evidence supporting Khazan et al.’s claims. We also requested Khazan et al.’s data to formally examine them for outliers and to re-analyze the data with and without the outliers. Khazan et al. refused. We read off the data from their Figure 1 and filled in several values using the brute force, exhaustive search constrained by the summary statistics reported by Khazan et al. Our re-analysis revealed six outliers and no evidence of gender bias. In fact, when the six outliers were removed, the female TA was rated higher than male TA but non-significantly so.

Author(s):  
Bob Uttl ◽  
Victoria Violo

In a recent small sample study, Khazan et al. (2020) examined SET ratings received by one female teaching (TA) assistant who assisted with teaching two sections of the same online course, one section under her true gender and one section under false/opposite gender. Khazan et al. concluded that their study demonstrated gender bias against female TA even though they found no statistical difference in SET ratings between male vs. female TA ( p = .73). To claim gender bias, Khazan et al. ignored their overall findings and focused on distribution of six negative SET ratings and claimed, without reporting any statistical test results, that (a) female students gave more positive ratings to male TA than female TA, (b) female TA received five times as many negative ratings than the male TA, and (c) female students gave most low scores to female TA. We conducted the missing statistical tests and found no evidence supporting Khazan et al.s claims. We also requested Khazan et al.s data to formally examine them for outliers and to re-analyze the data with and without the outliers. Khazan et al. refused. We read off the data from their Figure 1 and filled in several values using the brute force, exhaustive search constrained by the summary statistics reported by Khazan et al.. Our re-analysis revealed six outliers and no evidence of gender bias. In fact, when the six outliers were removed, the female TA was rated higher than male TA but non-significantly so.


Author(s):  
Bob Uttl ◽  
Victoria C. Violo

In a widely cited and widely talked about study, MacNell et al. (2015) [1] examined SET ratings of one female and one male instructor, each teaching two sections of the same online course, one section under their true gender and the other section under false/opposite gender. MacNell et al. concluded that students rated perceived female instructors more harshly than perceived male instructors, demonstrating gender bias against perceived female instructors. Boring, Ottoboni, and Stark (2016) [2] re-analyzed MacNell et al.’s data and confirmed their conclusions. However, the design of MacNell et al. study is fundamentally flawed. First, MacNell et al.’ section sample sizes were extremely small, ranging from 8 to 12 students. Second, MacNell et al. included only one female and one male instructor. Third, MacNell et al.’s findings depend on three outliers – three unhappy students (all in perceived female conditions) who gave their instructors the lowest possible ratings on all or nearly all SET items. We re-analyzed MacNell et al.’s data with and without the three outliers. Our analyses showed that the gender bias against perceived female instructors disappeared. Instead, students rated the actual female vs. male instructor higher, regardless of perceived gender. MacNell et al.’s study is a real-life demonstration that conclusions based on extremely small sample-sized studies are unwarranted and uninterpretable.


Author(s):  
Bob Uttl ◽  
Victoria Violo ◽  
Bob Uttl ◽  
Bob Uttl ◽  
Bob Uttl

In a widely cited and widely talked about study, MacNell et al. (2015) examined SET ratings of one female and one male instructor, each teaching two sections of the same online course, one section under their true gender and the other section under false/opposite gender. MacNell et al. concluded that students rated perceived female instructors more harshly than perceived male instructors, demonstrating gender bias against perceived female instructors. Boring, Ottoboni, and Stark (2016) re-analyzed MacNell et al.s data and confirmed their conclusions. However, the design of MacNell et al. study is fundamentally flawed. First, MacNell et al. section sample sizes were extremely small, ranging from 8 to 12 students. Second, MacNell et al. included only one female and one male instructor. Third, MacNell et al.s findings depend on three outliers -- three unhappy students (all in perceived female conditions) who gave their instructors the lowest possible ratings on all or nearly all SET items. We re-analyzed MacNell et al.s data with and without the three outliers. Our analyses showed that the gender bias against perceived female instructors disappeared. Instead, students rated the actual female vs. male instructor higher, regardless of perceived gender. MacNell et al.s study is a real-life demonstration that conclusions based on extremely small sample-sized studies are unwarranted and uninterpretable.


2018 ◽  
Vol 51 (03) ◽  
pp. 648-652 ◽  
Author(s):  
Kristina M. W. Mitchell ◽  
Jonathan Martin

ABSTRACTMany universities use student evaluations of teachers (SETs) as part of consideration for tenure, compensation, and other employment decisions. However, in doing so, they may be engaging in discriminatory practices against female academics. This study further explores the relationship between gender and SETs described by MacNell, Driscoll, and Hunt (2015) by using both content analysis in student-evaluation comments and quantitative analysis of students’ ordinal scoring of their instructors. The authors show that the language students use in evaluations regarding male professors is significantly different than language used in evaluating female professors. They also show that a male instructor administering an identical online course as a female instructor receives higher ordinal scores in teaching evaluations, even when questions are not instructor-specific. Findings suggest that the relationship between gender and teaching evaluations may indicate that the use of evaluations in employment decisions is discriminatory against women.


Author(s):  
Milica Maričić ◽  
Aleksandar Đoković ◽  
Veljko Jeremić

Student evaluation of teaching (SET) has steadily, but surely, become an important assessment tool in higher education. Although SET provides feedback on students level of satisfaction with the course and the lecturer, the validity of its results has been questioned. After extensive studies, the factor which is believed to distort the SET results is gender of the lecturer. In this paper, Potthoff analysis is employed to additionally explore whether there is gender bias in SET. Namely, this analysis has been used with great success to compare linear regression models between groups. Herein, we aimed to model the overall lecturer impression with independent variables related to teaching, communication skills, and grading and compare the models between genders. The obtained results reveal that gender bias exists in certain cases in the observed SET. We believe that our research might provide additional insights on the interesting topic of gender bias in SET.


Author(s):  
Christopher R. Dennison ◽  
Cagri Ayranci ◽  
Pierre Mertiny ◽  
Jason P. Carey

Student evaluation of teaching (SET) hasbeen used as a metric to arguably evaluate instructoreffectiveness and quality of instruction since the 1920s.SET is used in decisions regarding annual evaluation (offaculty) and is one of the most researched topics inevaluation of instructor effectiveness. Central researchquestions associated with SET include whether SET is anappropriate measure of effectiveness and whether it leadsto improved teaching and quality of graduates.In the fall 2013, the Faculty of Engineering at theUniversity of Alberta for the first time administered SETonline. The transition from paper-based and in class SETto online and out of class SET provides a uniqueopportunity to investigate changes in SET response rateand ratings of overall instructor effectiveness that couldbe attributed to change in protocol and that wouldsuggest protocol-related bias. Our preliminary resultsshow lower response rates for online SET andeffectiveness scores that were outside one standarddeviation of the previous 5-year mean. These findingsshow the importance of continuing to monitor web-basedSET results and point to directions of further research. Atthis time, this will not be possible, as the Faculty ofEngineering has discontinued online SET testing after asingle term.


2020 ◽  
Vol 38 (7) ◽  
pp. 866-881
Author(s):  
Alice E. Donlan ◽  
Virginia L. Byrne

End-of-semester evaluations provide scalable data for university administrators, but typically do not provide instructors with timely feedback to inform their teaching practices. Midsemester evaluations have the potential to provide instructors with beneficial formative feedback that can contribute to improved teaching practices and student engagement. However, existing research on the construction of valid, reliable midsemester tools is rare, and there are no existing midsemester evaluation scales that were constructed using education research and psychometric analysis. To address this gap, we designed and piloted a midsemester evaluation of teaching with 29 instructors and 1,350 undergraduate students. We found evidence that our Mid-Semester Evaluation of College Teaching (MSECT) is a valid and reliable measure of four constructs of effective teaching: classroom climate, content, teaching practices, and assessment. Furthermore, our factor structure remained consistent across instructor genders, providing evidence that the MSECT may be less susceptible to gender bias than prior student evaluation measures.


Sign in / Sign up

Export Citation Format

Share Document