My Experience with Specification Grading

Manuel A. Pérez-Quiñones
10 min readJan 4, 2021

The academic semester of Spring 2020 finished with most universities moving to online education as a response to the pandemic. During the Summer of 2020, a significant portion of the US citizenry went to the streets to denounce police violence. By this time, the early numbers from the pandemic showed that communities of color were being disproportionately impacted by the corona virus.

It is within that context that I prepared to teach in the Fall 2020 semester an asynchronous & online version of ITIS 4440 Interactive Systems Design and Implementation. It was the first time that I taught fully-online. But my biggest concern was how to make the classroom more equitable. It was clear in my mind that I would have students impacted by the virus, suffering the financial burdens of a slowing economy, and/or feeling the physical and emotional pain that came with the social protests demanding justice for African-Americans.

While exploring how to make my classroom more equitable, I found the book Grading for Equity by Joe Feldman. Surprisingly, there was a copy of the book already in my house — the privilege of being married to another educator and having a daughter that taught for Teach for America. I started reading parts of the book and realized how much I had to learn. I narrowed my search to grading in computer science and found a couple of articles that talked about specification grading. That lead me to the work of Dr. Linda B. Nilson. I found the ideas behind specification grading fascinating and highly applicable to my situation. So, I set off to try it out.

I am documenting here how I used specification grading in my classroom during the Fall 2020 semester. It is too early to make bold claims of magical solutions as I am not convinced that my implementation of the ideas of specification grading are fined tuned yet and I only have 1 semester as reference. Nevertheless, I am definitely not going back to the old way of grading.

After reading a few papers on the topic, this is how I interpreted specification grading:

  1. Grading student work based on a Pass/Fail basis (called Satisfactory and Unsatisfactory) so it is easier to grade but also to encourage students to reach Satisfactory via re-submissions (see next item). Satisfactory is normally set at a high level, typically B or higher, thus demanding rigor in student’s work.
  2. Allow for multiple submissions with quick feedback so students can focus on getting the assignment to a Satisfactory level rather than have them worry about a deadline and penalty points for not meeting such deadline.
  3. Grading based on rubrics specifically designed to assess meeting the Satisfactory requirement of the assignment rather than the point-based minutia that might be needed to justify the difference between an A or a B.
  4. Grouping Satisfactory submissions into bundles (more on this below) to determine the final grade rather than using the average of scores over many different types of assignments as an indicator for final grade.

How I explained it to the students

In the first episode of the CS Ed Podcast (season 2), Dr. Kristin Stephens-Martinez asked me how I explained this idea to my students and asked if I was willing to share my explanation. Here is a link to the part of my syllabus that explained how specification grading was going to work in the semester. I also recorded a short narration as part of my intro lecture and have included it below. The video used a older version of the syllabus than what is linked above.

Student Graded Work

In my class, students have different types of graded work. One type is workouts — drill & practice short programming problems available in CodeWorkout (https://codeworkout.cs.vt.edu). Another type is regular homework — these are short assignments with online quizzes in Canvas, intending to keep students engaged in the semester. Third type is the traditional larger programming assignments. These are graded via submission to Web-CAT (http://web-cat.org). A fourth type is a short writeup I call Technotes —short “how to” writeups that include a short video and a peer review of other student’s videos. Finally, we have traditional exams done with online quizzes via Canvas.

In Fall 2020, my course had 8 Workouts, 8 Homework assignments, 2 Technotes, 4 Programming Projects and 3 Exams. Instead of averaging the scores, I set a baseline for Satisfactory as suggested in the specification grading literature. For each type of assignments, students must meet the following level to consider it Satisfactory: workouts and homeworks were 80%, programming projects were 70%, a technote was deemed satisfactory by hand-grading using a rubric, and for exams, the average of the highest two exams was used to determine the grade. All of these levels were grouped in bundles and explained below.

Bundles

The connection between Satisfactory assignments and the final grade was done via bundles as explained on the video. This is very similar to several of the approaches described by Dr. Nilson in her book (see for example pages 44–46 of her book “Specification Grading”). The numbers shown below vary slightly from what was in the video, they reflect what I used at the end of the semester to grade the class.

Minimum requirements for an A

  • Complete to the Satisfactory level at least: 7 Workouts, 7 Homework , 3 Programming projects, 2 Technote writeups, 2 Technote videos, and all the peer-reviews assigned.
  • Obtain an average of 90% in two Exams

Minimum requirements for a B

  • Complete to the Satisfactory level at least: 5 Workouts, 5 Homework , 3 Programming projects, 2 Technote writeups, 1 Technote video, and all the peer-reviews assigned.
  • Obtain an average of 80% in two Exams

Minimum requirements for a C

  • Complete to the Satisfactory level at least: 4 Workouts, 4 Homework , 2 Programming projects, 1 Technote writeups, 1 Technote videos, and half of the peer-reviews assigned.
  • Obtain an average of 70% in two Exams

Minimum requirements for a D

  • Complete to the Satisfactory level at least: 3 Workouts, 3 Homework , 1 Programming project, 1 Technote writeup, and 1 Technote video.
  • Obtain an average of 60% in two Exams

Deadlines

I setup all assignments to have a Due Date according to when the material was covered in the course. I wanted student to pace themselves and did not want them to ignore the course until the last week and then try to speed work through everything at the end. Also adding a deadline on Canvas meant that the course home page showed Coming Up deadlines. And a few times, I sent a note at the beginning of the week with a reminder of assignments coming up.

In addition to the Due Date, I also setup an Accept Until date for a week later for all assignments. There was no penalty associated with the late submission, but this made the official deadline somewhat flexible and gave students the ability to be a little late without any penalties. The idea of the homework being late was always present however, which might have increased the importance of keeping up with work in the student’s mind.

Some of the specification grading literature suggests the use of tokens or other similar currency, earned with early submissions, that can be exchanged for re-submission or late submissions. I considered that but deemed it too much of a hassle to implement on this first time around. Instead, if a student wanted to submit something late, like a few did at the end of the semester, I opened the homework again for late submission. I did this for the few students that requested it. Most of them were students trying to increase their final grade by meeting the requirements of a higher grade bundle (see below).

My rationale and concerns before the semester

This is what was going through my mind at the beginning of the semester, related to the use of specification grading in my course.

  1. I wanted something that was flexible and that would motivate the student to work towards a better grade.
  2. I was concerned with how promptly I could grade student’s work as a way to keep the course flowing and thus wanted to take full advantage of autograders CodeWorkout and Web-CAT (disclosure: I had lots of experience using both of these systems).
  3. I wanted to have no penalties for late submissions. I was convinced that late submission penalties are unfair to students that are juggling multiple personal responsibilities (e.g., working parents, helping at home with small siblings, holding multiple part-time jobs, etc.). This seemed particularly more relevant at this point in time.
  4. I worried about cheating particularly with a fully asynchronous online class.
  5. I worried about student’s push back on the grading method, in particular concerned about students claiming that the grading policies were unfair or not equitable.
  6. And finally, I had the usual worries that all professors/teachers have when they try something new, exacerbated my naivete of online teaching, and by the COVID-19 concerns.

Some observations after the semester

Honestly, I think specification grading in my course worked very well. I got two direct emails from students thanking me for the semester and one in particular saying: “Thank you for having such an organized online course during this chaotic time. It’s made all the difference this semester.” The full student evaluations will be available later in January and I will get a better sense of student’s feedback then. Until then, here are some personal observations.

  1. Grade distribution was skewed towards A (1/3 of the class) but not enough that it worries me. There rest of the grades (2/3) were equally distributed among B,C,D, and F. The university approved a Pass/No Credit option that students were able to request right up till the last day of the semester. About 30% of the students requested a P/N, so the actual recorded grades were slightly different as some C, D and F’s turned into Pass/No Credit.
  2. I was satisfied (maybe even surprised?) of the positive impact that removing the traditional grading scheme had on student’s effort. I have always felt uncomfortable with students spending time to get the last few points in a programming assignment at the expense of studying for a test or doing other daily homework. The a 5 point difference in a score is only meaningful if you are trying to squeeze every point in one assignment to compensate for lost points in another one. From a learning objectives point of view, scoring higher than 85 or 90 is irrelevant; it is very likely that a student has already met the learning objectives even if they lose a few points. Redirecting the energy required to get extra points without sacrificing their final grade seems like a good idea to me. From a specification grading point of view, a submission that is Satisfactory is no different from a perfect score.
  3. Allowing for late submissions removed obstacles for success that were tied to a point in time in the semester (e.g. homework due on Friday or else you lose points). The outside demand for student’s attention during the Fall of 2020 was high. In my classroom alone I had students who lost part-time jobs, who fell sick for various reasons, who had friends or family members infected with the virus, and who struggled with the demand of an online class (which was not their choice). I saw students submitting assignments out of order, taking advantage of the no penalty for late work policy. For example, some students completed the last workout before going back to attempt an earlier workout that they would need to reach a bundle with a higher grade. Some students made submissions during final week to earlier assignments once they realized they did not need to take the final exam. One student skipped Project 2 due to a personal situation, and instead completed Project 3 and Project 4 to meet the requirements of a higher grade. All of these examples would not have been possible under my previous, and traditional, way of enforcing deadlines.
  4. At the end of the semester, several students met some of requirements for a bundle with a high grade (e.g., exams, workouts, technotes) but were off in another. These students opted to spend time in the last two weeks of the term working on the categories required to raise their grade (e.g., programming projects). For some, this brought up their final grade to a passing level, but for others this was the difference between a C and a B. The areas where extra work was required were evenly distributed; there were students submitting projects, some doing workouts, some completing the last homework, and some took the final exam to raise their exam average. Clearly, the substitution of “average” across all types of work for a count of Satisfactory work, allowed the students to focus precisely where they needed more work to impact their grade. Also the no-late penalty worked hand in hand by encouraging them to continue to work on those areas where they needed additional work.
  5. I did not see blatant or obvious cheating. I did not see too many students with perfect scores, typically an indication that they got the solution to the problems elsewhere. I did not see too many students with identical answers in those assignments where such a solution would have been detected. I also saw scores for submissions to CodeWorkout and Web-CAT steadily increase as students made additional submissions, again an indication that they were making progress towards a solution rather than just submitting a completed solution in one shot. I am not naive, I know this doesn’t mean that some students might not have had undue help. But I am comfortable with the effort put forth by the students.

Summary

In the Fall of 2020, I set out to explore how to make my classroom more equitable. I focused on how grading schemes can be unfair to student that have a lot going on outside of the classroom. By changing how grading was done, I feel I allowed the students additional chances to succeed without requiring more effort on grading on my part and without creating a antagonistic climate in the classroom. These are some of very advantages reported in the literature about specification grading.

My one term experience says that this grading approach works, at least for my class. This much is clear to me: I am not going back to the old way of grading.

--

--

Manuel A. Pérez-Quiñones

Puerto Rican PhD in Computer Science, love salsa, sports, diversity, scifi, and comics. Opinions are mine & don’t reflect my employer.