Why are there so few female professors? Despite the fact that the fraction of women enrolling in graduate programs has increased over the last decades, the proportion of women who continue their careers in academia remains low. One explanation that could explain these gender disparities are gender-biased teaching evaluations. In the competitive world of academia student evaluations are an important and frequently used assessment criterion for faculty performance. Outcomes of teaching evaluations affect hiring, tenure, and promotion decisions and, thus, have a strong impact on career progression.
Teaching evaluations are not the only domain in which there is evidence for gender bias in academia. Gender, for example, has been shown to be important for the success of grant proposals (see Van der Lee and Ellemers, 2015). It is, however, not straightforward to analyse gender biases for two reasons. First, it is often difficult to provide evidence on the potential underlying objective performance differences between women and men. Second, if evaluators are not randomly assigned to individuals who are evaluated, the estimated bias might reflect sorting of individuals with different degrees of stereotypes to male and female teachers. A few studies have overcome the latter problem by using random variation in the composition of hiring and promotion committees, and found mixed results (see Bagues et al. (2017) and De Paola and Scoppa (2015)).
Why is it important to think about whether teaching evaluations are biased? Bad teaching evaluations for women may not only have a direct negative effect on career progression, but may also have an indirect effect and lead to a reallocation of scarce resources from research to teaching. This reallocation of resources may in turn lead to lower (quality) research outputs. Gender biased teaching evaluations can also affect instructors’ self-confidence and beliefs about their teaching abilities, which may impact a woman’s decision on whether to continue in academia.
A recent study investigated whether female teachers receive lower teaching evaluations by using an exceptionally rich dataset of 19,952 evaluations of instructors at a Dutch university. An important advantage of the study in comparison to previous work is that students are randomly assigned to either female or male instructors within courses. This implies that gender differences in evaluations cannot simply reflect that women and men, for example, do teach different courses, or that more critical students choose courses with more female teachers. Another advantage compared to previous studies on gender bias is that the data also contains information on students’ grades and study effort, which allows to test whether gender differences in evaluations could be justified by objective performance differences between male and female teachers.
We find that female instructors receive systematically lower teaching evaluations than their male colleagues despite the fact that the instructor’s gender does not affect students’ grades, nor the effort students put into the course. These lower teaching evaluations of female faculty stem mostly from male students, but the effect is also present, though smaller in size, among female students. The gender bias is biggest for junior women and more concentrated in math-related courses, in which stereotypes against female teachers might be stronger compared to non-math related courses. In teaching evaluations, students are not only asked to evaluate their teacher’s performance, but also asked to evaluate learning materials, such as text books, research articles, or the online learning platform. In our setting, these are identical for all sections of a course, irrespective of whether the section teacher is male or female. Strikingly, we find that these evaluation items, which are not in control of the section teacher, are more negative if the teacher is female, compared to when the teacher is male. One possible mechanism to explain this spillover effect is that students anchor their response to material-related questions based on their previous responses to instructor-related questions. We find no evidence that these gender differences in teaching evaluations are driven by gender differences in teaching performance. Results show that the gender of the instructor does not affect current or future grades, nor does it impact the effort of students, measured as self-reported study hours.
These results imply that teaching evaluations should be used with caution when evaluating individual teachers. Although frequently used for hiring and promotion decisions, teaching evaluations are usually not corrected for possible gender bias, the student gender composition, or the fact that not all students participate in these evaluations. If the gender bias in teaching evaluations is not taken into account, female teachers may receive fewer teaching awards, fewer pay rises, and ultimately be promoted less often than male teachers. Besides these possible direct effects, indirect consequences, such as the effect on the allocation of scare time from research to teaching, can further increase this problem. For these reasons, it is important to reconsider the importance of teaching evaluations for performance reviews of individual teachers, and to put less weight on teaching evaluations.
Featured image credit: Chairs education school college by Nathan Dumalo. Public domain via Unsplash.