scholarly journals Small samples, unreasonable generalizations, and outliers: Gender bias in student evaluation of teaching or three unhappy students?

Author(s):  
Bob Uttl ◽  
Victoria C. Violo

In a widely cited and widely talked about study, MacNell et al. (2015) [1] examined SET ratings of one female and one male instructor, each teaching two sections of the same online course, one section under their true gender and the other section under false/opposite gender. MacNell et al. concluded that students rated perceived female instructors more harshly than perceived male instructors, demonstrating gender bias against perceived female instructors. Boring, Ottoboni, and Stark (2016) [2] re-analyzed MacNell et al.’s data and confirmed their conclusions. However, the design of MacNell et al. study is fundamentally flawed. First, MacNell et al.’ section sample sizes were extremely small, ranging from 8 to 12 students. Second, MacNell et al. included only one female and one male instructor. Third, MacNell et al.’s findings depend on three outliers – three unhappy students (all in perceived female conditions) who gave their instructors the lowest possible ratings on all or nearly all SET items. We re-analyzed MacNell et al.’s data with and without the three outliers. Our analyses showed that the gender bias against perceived female instructors disappeared. Instead, students rated the actual female vs. male instructor higher, regardless of perceived gender. MacNell et al.’s study is a real-life demonstration that conclusions based on extremely small sample-sized studies are unwarranted and uninterpretable.

Author(s):  
Bob Uttl ◽  
Victoria Violo ◽  
Bob Uttl ◽  
Bob Uttl ◽  
Bob Uttl

In a widely cited and widely talked about study, MacNell et al. (2015) examined SET ratings of one female and one male instructor, each teaching two sections of the same online course, one section under their true gender and the other section under false/opposite gender. MacNell et al. concluded that students rated perceived female instructors more harshly than perceived male instructors, demonstrating gender bias against perceived female instructors. Boring, Ottoboni, and Stark (2016) re-analyzed MacNell et al.s data and confirmed their conclusions. However, the design of MacNell et al. study is fundamentally flawed. First, MacNell et al. section sample sizes were extremely small, ranging from 8 to 12 students. Second, MacNell et al. included only one female and one male instructor. Third, MacNell et al.s findings depend on three outliers -- three unhappy students (all in perceived female conditions) who gave their instructors the lowest possible ratings on all or nearly all SET items. We re-analyzed MacNell et al.s data with and without the three outliers. Our analyses showed that the gender bias against perceived female instructors disappeared. Instead, students rated the actual female vs. male instructor higher, regardless of perceived gender. MacNell et al.s study is a real-life demonstration that conclusions based on extremely small sample-sized studies are unwarranted and uninterpretable.


Author(s):  
Bob Uttl ◽  
Victoria Violo

In a recent small sample study, Khazan et al. (2020) examined SET ratings received by one female teaching (TA) assistant who assisted with teaching two sections of the same online course, one section under her true gender and one section under false/opposite gender. Khazan et al. concluded that their study demonstrated gender bias against female TA even though they found no statistical difference in SET ratings between male vs. female TA ( p = .73). To claim gender bias, Khazan et al. ignored their overall findings and focused on distribution of six negative SET ratings and claimed, without reporting any statistical test results, that (a) female students gave more positive ratings to male TA than female TA, (b) female TA received five times as many negative ratings than the male TA, and (c) female students gave most low scores to female TA. We conducted the missing statistical tests and found no evidence supporting Khazan et al.s claims. We also requested Khazan et al.s data to formally examine them for outliers and to re-analyze the data with and without the outliers. Khazan et al. refused. We read off the data from their Figure 1 and filled in several values using the brute force, exhaustive search constrained by the summary statistics reported by Khazan et al.. Our re-analysis revealed six outliers and no evidence of gender bias. In fact, when the six outliers were removed, the female TA was rated higher than male TA but non-significantly so.


Author(s):  
Bob Uttl ◽  
Victoria Violo

In a recent small sample study, Khazan et al. [1] examined SET ratings received by one female teaching (TA) assistant who assisted with teaching two sections of the same online course, one section under her true gender and one section under false/opposite gender. Khazan et al. concluded that their study demonstrated gender bias against female TA even though they found no statistical difference in SET ratings between male vs. female TA (p = 0.73). To claim gender bias, Khazan et al. ignored their overall findings and focused on distribution of six “negative” SET ratings and claimed, without reporting any statistical test results, that (a) female students gave more positive ratings to male TA than female TA, (b) female TA received five times as many negative ratings than the male TA, and (c) female students gave “most low” scores to female TA. We conducted the missing statistical tests and found no evidence supporting Khazan et al.’s claims. We also requested Khazan et al.’s data to formally examine them for outliers and to re-analyze the data with and without the outliers. Khazan et al. refused. We read off the data from their Figure 1 and filled in several values using the brute force, exhaustive search constrained by the summary statistics reported by Khazan et al. Our re-analysis revealed six outliers and no evidence of gender bias. In fact, when the six outliers were removed, the female TA was rated higher than male TA but non-significantly so.


2018 ◽  
Vol 51 (03) ◽  
pp. 648-652 ◽  
Author(s):  
Kristina M. W. Mitchell ◽  
Jonathan Martin

ABSTRACTMany universities use student evaluations of teachers (SETs) as part of consideration for tenure, compensation, and other employment decisions. However, in doing so, they may be engaging in discriminatory practices against female academics. This study further explores the relationship between gender and SETs described by MacNell, Driscoll, and Hunt (2015) by using both content analysis in student-evaluation comments and quantitative analysis of students’ ordinal scoring of their instructors. The authors show that the language students use in evaluations regarding male professors is significantly different than language used in evaluating female professors. They also show that a male instructor administering an identical online course as a female instructor receives higher ordinal scores in teaching evaluations, even when questions are not instructor-specific. Findings suggest that the relationship between gender and teaching evaluations may indicate that the use of evaluations in employment decisions is discriminatory against women.


2021 ◽  
Vol 2 (1) ◽  
pp. 148-179
Author(s):  
Mohammad Jahangir Hossain Mojumder

Nowadays, demands are growing for outcome-based and transferable learning, particularly in higher education. Being the terminal formal schooling, it needs facilitation of pupils’ achievement of problem-solving skills for real-life by teachers. To this end, this qualitative research employs a case study approach, which is suitable to test an event with small samples, and a phenomenological method to analyze respondents’ perceptions and activities thematically and descriptively to assess changes. In-depth interviews, focus group discussions, and class observations are used to collect data from two selected colleges to examine the extent of professional development and methodological shift in teaching as effects of training to include active learning strategies for better learning outcomes. The data though reveals that the selected flagship training program offers a bunch of pedagogical methods (not need-based) to imbibe, yet reject the idea that the nationally arranged training remains a successful effort to increase trainees’ knowledge, skills, and polish attitudes except disseminating a few concepts superficially. Moreover, trainees lack the motivation to shift their teaching habits and are unconvinced that the application of these newly learned strategies will transform anything. Likewise, they are discontented about training contents and unenthusiastic in consort with unfavorable opinions about training procedures and trainers to some extent. Therefore, the results suggest limited or no significant professional development and modification in teaching practice, rather teachers continue conventional teacher-centered method, and the effort stays insufficient, extraneous, ‘fragmented’, and ‘intellectually superficial’. Additionally, at the colleges, large class size, inappropriate sitting arrangement, pervasive traditionality, absenteeism, and other analogous challenges limited them to change their practice. Considering all these, this study suggests that alternations should be initiated at a micro (teachers & college) and macro-level (training providers & policymakers) to offer tailor-made, autonomous, and need-based training. Last but not the least, this endeavor is limited by being entirely qualitative with a small sample size and not eliciting the views of any of the trainers and policymakers and which can be an indication of points of departure for future study.


2006 ◽  
Vol 361 (1475) ◽  
pp. 2023-2037 ◽  
Author(s):  
Thomas P Curtis ◽  
Ian M Head ◽  
Mary Lunn ◽  
Stephen Woodcock ◽  
Patrick D Schloss ◽  
...  

The extent of microbial diversity is an intrinsically fascinating subject of profound practical importance. The term ‘diversity’ may allude to the number of taxa or species richness as well as their relative abundance. There is uncertainty about both, primarily because sample sizes are too small. Non-parametric diversity estimators make gross underestimates if used with small sample sizes on unevenly distributed communities. One can make richness estimates over many scales using small samples by assuming a species/taxa-abundance distribution. However, no one knows what the underlying taxa-abundance distributions are for bacterial communities. Latterly, diversity has been estimated by fitting data from gene clone libraries and extrapolating from this to taxa-abundance curves to estimate richness. However, since sample sizes are small, we cannot be sure that such samples are representative of the community from which they were drawn. It is however possible to formulate, and calibrate, models that predict the diversity of local communities and of samples drawn from that local community. The calibration of such models suggests that migration rates are small and decrease as the community gets larger. The preliminary predictions of the model are qualitatively consistent with the patterns seen in clone libraries in ‘real life’. The validation of this model is also confounded by small sample sizes. However, if such models were properly validated, they could form invaluable tools for the prediction of microbial diversity and a basis for the systematic exploration of microbial diversity on the planet.


2016 ◽  
Vol 6 (1) ◽  
pp. 1
Author(s):  
Giuliana Cortese

Receiver operating characteristic (ROC) curves are a frequent tool to study the discriminating ability of a certain characteristic. The area under the ROC curve (AUC) is a widely used measure of statistical accuracy of continuous markers for diagnostic tests, and has the advantage of providing a single summary  index of overall performance of the test. Recent studies have shown some critical issues related to traditional point and interval estimates for the AUC, especially for small samples, more complex models, unbalanced samples or values near the boundary of the parameter space, i.e., when the AUC approaches the values 0.5 or 1.Parametric models for the AUC have shown to be powerful when the underlying distributional assumptions are not misspecified. However, in the above circumstances parametric inference may be not accurate, sometimes yielding  misleading conclusions. The objective of the paper is to propose an alternative inferential approach based on modified profile likelihoods, which provides more accurate statistical results in any parametric settings, including the above circumstances. The proposed method is illustrated for the binormal model, but can potentially be used in any other complex model and for any other parametric distribution. We report simulation studies to show the improved performance of the proposed approach, when compared to classical first-order likelihood theory. An  application to real-life data in a small sample setting is also discussed, to provide practical guidelines.


Author(s):  
Milica Maričić ◽  
Aleksandar Đoković ◽  
Veljko Jeremić

Student evaluation of teaching (SET) has steadily, but surely, become an important assessment tool in higher education. Although SET provides feedback on students level of satisfaction with the course and the lecturer, the validity of its results has been questioned. After extensive studies, the factor which is believed to distort the SET results is gender of the lecturer. In this paper, Potthoff analysis is employed to additionally explore whether there is gender bias in SET. Namely, this analysis has been used with great success to compare linear regression models between groups. Herein, we aimed to model the overall lecturer impression with independent variables related to teaching, communication skills, and grading and compare the models between genders. The obtained results reveal that gender bias exists in certain cases in the observed SET. We believe that our research might provide additional insights on the interesting topic of gender bias in SET.


Sign in / Sign up

Export Citation Format

Share Document