In a widely cited and widely talked about study, MacNell et al. (2015) [1] examined SET ratings of one female and one male instructor, each teaching two sections of the same online course, one section under their true gender and the other section under false/opposite gender. MacNell et al. concluded that students rated perceived female instructors more harshly than perceived male instructors, demonstrating gender bias against perceived female instructors. Boring, Ottoboni, and Stark (2016) [2] re-analyzed MacNell et al.’s data and confirmed their conclusions. However, the design of MacNell et al. study is fundamentally flawed. First, MacNell et al.’ section sample sizes were extremely small, ranging from 8 to 12 students. Second, MacNell et al. included only one female and one male instructor. Third, MacNell et al.’s findings depend on three outliers – three unhappy students (all in perceived female conditions) who gave their instructors the lowest possible ratings on all or nearly all SET items. We re-analyzed MacNell et al.’s data with and without the three outliers. Our analyses showed that the gender bias against perceived female instructors disappeared. Instead, students rated the actual female vs. male instructor higher, regardless of perceived gender. MacNell et al.’s study is a real-life demonstration that conclusions based on extremely small sample-sized studies are unwarranted and uninterpretable.