Timeline: Contract language on course evaluations 1976-present

Image of cover of 2014-17 Agreement

WMU faculty are rightly proud of the top-quality instruction we provide to our students and deeply invested in receiving substantive feedback from them. The WMU-AAUP is equally invested in helping our faculty colleagues access reliable, useful feedback that is free of the kinds of racial, gender, and other bias that unfortunately many of our colleagues have experienced firsthand in their ratings and that by now have been well documented as endemic to student evaluations of teaching. These are issues that our 2011 and 2014 negotiation teams raised at the bargaining table. Both times, the administration refused to engage in this conversation. Still, we remain hopeful that creative solutions can be found through open collaboration between faculty, administration, and students.

The first WMU-AAUP Agreement (1976-77):

The authority for determining how teaching effectiveness would be measured and reported belonged to the faculty: It was “the responsibility of the faculty in each department” to determine “the evaluation methods to be used” and “the procedures to be followed.”

A department’s procedures “may provide for” student ratings. Peer evaluation was required, while “external evaluation” was “encouraged in appropriate circumstances,” and “self-evaluation” was “strongly recommended.”


The role of the faculty is reaffirmed. New language appears, emphasizing that “student evaluations are primarily intended for use in faculty self-improvement.” Similar language is retained in every subsequent contract and still appears in Article 16 of the 2014-17 Agreement.


The original 1976-77 language along with the language added in 1977-78 is retained. The authority for determining how evaluation of teaching will be conducted and procedures for how evaluative data is reported remains with the faculty.


Student ratings are required for the first time “in at least one semester of each academic year.” Similar language governing the frequency of data collection appears in every subsequent contract, although it was strengthened in 2002 and the stronger version has been retained since then.


First contractual mention of a “nationally normed” rating instrument to be used across all departments at WMU, although it goes only as far as establishing a committee to consider such an instrument. No faculty rights from previous contracts are relinquished.

1987-90 and 1990-93:

Nothing here to indicate that a “nationally normed” instrument has been adopted at WMU, despite the 1984-87 directive. No mention again of a uniform instrument to be used across the university until 1993-96. No faculty rights from previous contracts are yielded in 1987-90 or 1990-93.


First contract language to require one university-wide instrument. Also notable because it appears to shift responsibility and authority for determining evaluation procedures away from department faculty. However, the instrument and guidelines for use are not specified, and the language of the next two contracts indicates that the university-wide instrument and guidelines envisioned in 1993 did not actually come to pass while the 1993-96 contract was in effect.

1996-99 and 1999-2002:

First to state that “Departmental faculty shall use a uniform student rating form” for all faculty in the department. No reference to a university-wide instrument, despite language included in the 1993-96 contract on that topic. Specifies that “Each department may maintain in use its own established rating form, or may generate its own form, or may select a rating form from a file of such instruments housed in the Office of Faculty Development.” The same language is used in the 1999-2002 Agreement.


Language governing the frequency for collecting student rating data in every contract since 1981-84 is revised and expanded. Original language directed that “Student evaluations shall be conducted in every class taught by unit faculty members in at least one semester of each academic year.” The 2002-05 version adds “(to be determined by the faculty member)” at the end of the sentence. This strengthened language from the 2002-05 Agreement has appeared in every subsequent contract and still stands in the 2014-17 Agreement.

Parties also “agreed to move, on a trial basis, over the course of this Agreement, toward the use of one valid and reliable student rating instrument by all members of the bargaining unit.” To that end, a new committee is established, along with a timeline for finding and adopting an instrument.

2005-08 and 2008-11:

Both preserve all faculty rights regarding the collection and use of rating data. 2005-08 adds new language regarding the use of a university-wide student rating instrument for use across all departments and establishes the ICES Steering Committee.

The parties “agree to the use of one valid and reliable student rating instrument by all members of the bargaining unit,” namely the one “recommended by the Evaluation Study Committee in its report of February 14, 2003,” a/k/a ICES (paper version). The same language appears in the 2008-11 contract, which also retains all faculty rights regarding the collection and use of rating data.


Language codifying the use of ICES online is added. All language articulating faculty rights regarding the collection and use of rating data is retained.

Context: Student participation rates in the online version of ICES introduced in 2010 were so low as to render the data practically meaningless. At the table in 2011, the administration’s team wanted an all-classes, every-semester model for evaluation as a fix. It appeared to be an attempt to get the WMU-AAUP team to give up a faculty right that has been in every contract since 1981 but would not solve the problems at hand. The WMU-AAUP team said no and challenged the general reliability of student ratings, citing research demonstrating that bias is endemic to student evaluations. The administration’s team was not interested in talking about that.

The two sides negotiated a Letter of Agreement (LOA) to establish a joint committee to work on ideas for improving student participation rates before the next negotiation (2014).


The joint committee established by the 2011 LOA was not successful. The WMU-AAUP appointees resigned when administrators on the committee tried to expand the committee’s charge and reconsider how faculty teaching is evaluated more widely, a contractual matter that was therefore better suited for the bargaining table. The committee was disbanded shortly thereafter.

At the table in 2014, the two sides again discussed the low student participation rates. Once again, the administration’s team suggested moving to an every-class, every-semester model, and once again they did not offer any evidence that this would improve per-class participation rates. The WMU-AAUP team said no and once again brought evidence of bias in course evaluations. And once again, the administration would not discuss this.

Another LOA establishing another joint committee was signed. This time, a timeline for the committee’s work was included. The committee’s recommendation in 2015 was to conduct a pilot study in which all sections of all courses would be evaluated, a model that the 2011 and 2014 WMU-AAUP negotiation teams had already rejected at the bargaining table.

The WMU-AAUP Executive Committee, in consultation with the 2014 team, voted against the proposal on grounds that it would violate the longstanding language in Article 16§4: “Student ratings shall be conducted in each class taught by a bargaining unit faculty member in at least one semester of each academic year (to be determined by the faculty member).”

All language articulating faculty rights regarding the collection and use of rating data is retained in the 2014-17 Agreement.


Some recent studies and reports on bias in course evaluations:

“An evaluation of course evaluations,” Stark and Freishtat, Science Open, September 26, 2014.

“Bias against female instructors,” Colleen Flaherty, Inside Higher Ed, January 11, 2016.

“Do the best professors get the worst ratings?” Nate Kornell, Psychology Today, May 31, 2013.

“Evaluating students’ evaluations of professors,” Bragaa, Paccagnellab, and Pellizzari, Economics of Education Review 41, August 2014.

“Flawed evaluations,” Colleen Flaherty, Inside Higher Ed, June 10, 2015.

“Needs improvement: Student evaluations of professors aren’t just biased and absurd—they don’t even work,” Rebecca Schuman, Slate, April 24, 2014.

“Online students give instructors higher marks if they think instructors are men, North Carolina State University News, December 9, 2014.

“Student evaluations of teaching (mostly) do not measure teaching effectiveness,” Boring, Ottoboni, and Stark, Science Open Research, January 7, 2016.

“Student course evaluations get an F,” Anya Kamenetz, NPR.org, September 26, 2014.

“Student evaluations offer bad data that leads to the wrong answer,” Stuart Rojstaczer, New York Times, September 18, 2012.

“Student evaluations of teaching are probably biased. Does it matter?” Erik Voten, Washington Post, October 10, 2013.

“Students praise male professors,” Kaitlin Mulhere, Inside Higher Ed, December 10, 2014.