Assessments: The Collateral Damage of SBG
Fair Warning: This post took a month to write. It’s long. It’s involved. It’s also a meditation on my entire year trying to implement a Standards-Based Grading (SBG) system and what that even means. But first, an introduction.
Why It’s Important to Think About Assessment & SBG: My classroom is a game that my students play. I set the rules by how I allow them to succeed or fail in my class. If I’ve done it right, then the rules I set should motivate genuine learning and reflect that knowledge in the form of a ‘grade’. In my experiences as an observer in ‘good’ classrooms and ‘bad’ classrooms, the most reliable way to measure this is through independent performance on consistent evaluative assessments balanced with frequent feedback in the form of formative assessments. So, I need my tests and quizzes to be the focus of the ‘game’ that is my classroom, and I need them to behave in such a way that my students find them motivating while I make sure they are an accurate reflection of student performance. And I need all of this to be transparent – the better we understand the rules of the game, the better we are at playing and winning the game. This is all much harder than it sounds.
I’ve been thinking a lot about assessment because I ended my last year unsatisfied with my assessments. I never thought anything was ‘broken’ or a complete disaster, but I never felt like my assessment and grading systems were operating as efficiently as they could be. I found myself constantly retooling my assessments in an effort to find a magical balance between how and when I presented my assessments, how I graded them, and then what me and my students did with those grades.
In looking around for resources, I found Standards-Based Grading (hereafter: SBG). I read Dan Meyer (and here). I read Shawn Cornally. I read Jason Buell. I read Sam Shah. I read Frank Noschese. If you haven’t read these, you should. Seriously. Like, take a break from this, go crazy in the world of SBG philosophy, then have a cup of coffee to let it all process, then come back and finish this post.
Reading all of these authors (and the many others I read but didn’t list) and reflecting on my own experiences in the classroom, I think everyone implements SBG slightly different. These differences can manifest in a lot of different ways – some people’s SBG system includes changes to homework and quizzes; some people make changes to their classroom structure & procedures; some people make changes to how they grade; some people make changes to how often they assess. I also think some differences have to do with external factors, such as whether they teach in a science classroom or a math classroom; that some are teaching middle school versus high school, some are teaching in classes with high-stakes testing pressures, and some are teaching advanced students (both in the sense of mathematical knowledge and in other student metrics such as notetaking and focus). The thing I found most interesting was the difference in length of some teachers lists of standards, as well as the level of cognitive demand for each standard. Some teachers have 100 highly-isolated standards, while some have 20-30 standards that involve synthesis and a high cognitive demand. This is what made me curious about assessments in the first place – if both of these teachers said they were implementing Standards-Based Grading, it was hard for me to believe they were assessing and grading the same way.
Despite all of the difference, there is one thing that every SBG teacher has in common: They separate their gradebook into separate standards. On the surface, this seems like a simple change that any teacher can make. However, I tried to trace the effects that this change had on my classroom and found it to be fundamental to the other monumental successes I’ve had this year. In other words, I imagined ‘What if the first change I made to my classroom was to separate my gradebook into standards – how would this affect other aspects of my classroom?” I claim that this ‘simple’ gradebook change causes so much collateral damage that it forces you to fundamentally shift several aspects of your classroom, leading to all of the homework and classroom and grading and reassessment policies that I’ve read about on other blogs. Reading what others have written about SBG, I think we’re all finding ways to deal with the collateral damage that SBG has created.
So, what follows is what I’ve pieced together from how I handled this change to my classroom – the things I realized I needed to adjust and why I needed to adjust them. I think of them like dominoes falling on one another, and it all starts with…
Fundamental SBG Change: I decide I want to represent each standard separately in the gradebook.
The First Domino: I need to choose the standards I want to assess. This makes me reflect on my curriculum, but I immediately see that I need to start doing it at a more micro level. I am no longer content with thinking of my curriculum as a series of ‘units’ with a singular heading such as ‘Unit 3: Parallel Lines’. I need to understand the standards that compose each of my units, which means I need to start thinking about the types of problems and concepts my students need to understand and show proficiency in by the end of the unit. I’ve probably already done something like this – I have vocabulary and homework assignments and types of problems already organized in the units – but these are still just general ideas I have bouncing around in my head. I need to reorganize them into explicit, concrete, measurable and assessable standards.
The vocabulary and conceptual foundation is pretty straightforward for each unit, but then I look at all the problems we solve in a given unit. And I realize there is a such a variety of problems within any particular unit that it is almost unimaginable to turn each one into an assessable standard! This is my biggest criticism of textbooks – jumping around too frequently in terms of the types of problems they make available. I need to decide which types of problems are the most important, either because of their relevance in the rest of the curriculum or because of how well they illustrate how well a student understands a particular standard. And then I realize that I don’t know if I really care about all of these different types of problems – all of these special applications of parallel lines or how algebra magically appears in my unit on triangle congruency – maybe I just want to find the problems that are fundamental to this unit and figure out a way to assess them honestly and requiring some real cognitive legwork from my students.
Pre-SBG Example: I’m teaching a unit on matrices in Algebra II. Given 3 points in the coordinate plane, did you know you can use matrices and determinants to find the area of the triangle they form? I think that’s pretty awesome – my kids think its just okay. We spent a day on it towards the end of the unit as an excuse to continue practicing matrices and determinants. There was a homework assignment, then we moved on to something else. Do these problems demonstrate an application of the underlying concepts – matrices and determinants? Yes it does. Is it an application so specific that we’re never going to talk about it as the unit and year progresses? Yes it is. Am I tempted to include one question on my test to send the message “HEY! We spent a day on this in class and we had a homework assignment on it, so you better do it because I said it would be on the test!” Yah, I’d probably do this. But the real message it sends is “I use my tests to reinforce that you should be doing my homework for arbitrary reasons and to punish you when you don’t”
SBG forces me to see the dishonesty in the situation above and how out-of-place this problem is in my curriculum. It forces me to recognize that maybe the reason this lesson is unsatisfying every year is because I’m not doing it for the right reasons. It forces me to realize that if I’m going to do SBG, I either need to expand on this topic, or I need to ignore it completely. But SBG also makes me realize that if I ignore it completely, I need something to act as a ‘capstone’ to this unit. Something that we build up to. Maybe my colleague decides to continue to develop these problems and end the unit with a project of some kind, whereas I decide to show my students how matrices and determinants let computer graphics perform rotations and transformations in the coordinate plane. Both applications are interesting and allow each of us a bit of creativity in the curriculum.
Suddenly I’m examining my curriculum with a much closer lens. I’m dissecting units and removing bloat that I don’t need while also expanding certain problems to be the ‘big idea’ or ‘capstone’ of the unit (I’ve started calling these Synthesis Standards). My curriculum is becoming more concise and more coherent – the lines between where one unit ends and another begins starts to become blurred, since many of the skills build on each other. I realize that certain concepts can be rearranged and should be included in earlier units or maybe somewhere much later instead. As a consequence of this closer lens, I start to think about the assessments themselves. Since each standard is grades separately, I need to separate my standards on my assessments somehow. I chose to have each page as a separate standard, but I’ve also seen it where each problem on the assessment is a separate standard. I start to imagine the format and the types of problems that will be on the assessment. And then the second domino falls…
The Second Domino: I need to be extremely careful about the things I decide to assess and how I write my assessments because I can no longer include useless problems. In the past, I would prepare a ‘Parallel Lines Unit Test’, which would have a collection of problems that fell under the heading ‘These problems involve Parallel Lines”. Maybe I look through the notes and bellworks and homeworks and say “Oh yah – we did problems like this. And like this. And one like this… Oh, and we should have one like this too”. My test becomes a mismash of problems requiring various levels of cognitive demand. I include some ‘gimme’ problems at the beginning as well as some ‘extension’ problems at the end.
This is Retrospective Test Design – design based on things we’ve talked about and things students should do. Even if it’s a test I’ve used many times before, it was probably first designed Retrospectively. SBG Doesn’t Let Me Do This Anymore. I’m holding myself accountable to individual standards, not the holistic idea of a unit. So each question/page needs to be connected to a specific standard and I need to think carefully about what these questions should be, how many questions of a certain type I should ask, and whether or not it really measures what I want my students to know (procedural vs conceptual).
Since I’m including less problems and am being more explicit about targeting specific standards, my assessments become shorter. I can usually fit them on one page. Sometimes I find that I only need one problem to assess a particular standard – I just need to make sure that the problem is complex enough to encompass all of the procedural knowledge that I want to cover. Sometimes I need a mix of procedural questions and slightly open-ended conceptual questions (describe… explain… draw…). Finding the right balance in assessment questions is difficult – I’m still figuring it out. The results from an assessment should be clear – too many questions may blur how much a student really understands; too few questions doesn’t give me enough confidence in their abilities. Marzano talks about the ‘observed score’ vs the ‘true score’ of a test – the true score represents what a student actually knew on a test, while the observed score factors in things like luck, guessing, misreading directions, etc. The goal is to design assessments that make this gap as small as possible. I tell my students that their work on an assessment is like an argument – they’re saying “I know how to do this!” and I’m saying “I don’t believe you! Prove it!” The more correct answers, the better the explanation, the more work they show, the more consistent they are – the more likely I am to believe them.
So I started thinking about which standards I wanted to assess, which affects my curriculum. Then I started thinking about which questions to include in order to paint an accurate picture of understanding, which affects my assessment design. This means I need to make sure that I grade in a way that makes sure I am accurately representing how well a student understands these standards. And then the third domino falls:
The Third Domino: I realize that grading is fundamentally subjective.
Pre-SBG Example: I give a test that’s 4 pages. It includes items from various parts of the unit that I expect students to be able to do. It includes some very basic memorization-esque questions, a variety of basic application questions, and then some more complex questions. Then I decide how I want to assign points. I choose to do it in such a way that a student can’t get an A unless they answer the advanced questions; they can’t get a C unless they answer at least half of the basic application questions correctly; they will probably fail if they can’t answer the very basic questions correctly. ‘Explain’ /’Describe’ questions are tricky to grade in this way. I weigh each problem differently so I have an idea of where grades will fall depending on how students do on the test.
This process becomes even more muddled if I decide to assign partial credit. Maybe those advanced/complex problems become worth a ton of points because of all the partial credit involved, allowing students to receive a passing score even if they don’t actually get a single question completely correct. Maybe this is okay for some problems, maybe it isn’t for others. ‘Explain’ / ‘Describe’ questions are still a little tricky to grade.
So I start grading these tests. As I do it, I feel like the grades are ‘fair’ because the system for assigning points has been decided beforehand and will hopefully create a grade distribution that matches up with what I think the distribution should be. If a student were to ask me why they earned the score that they did, I could answer ‘because this problem was worth ____ points and you left out these steps, so you didn’t get these points’. Because of my careful planning, students tend to fall about where I expect them too – and if they don’t, I analyze the exam to see if there was a question that was poorly worded or if I need to adjust how I assign points.
Last year, the tests that I was ‘proud of’ were the ones where students grades tended to fall where I expected them too and the data I could gather was the most clear. My sense of pride had little to do with the test itself – it had to do with how well I decided to grade it. This is a strange thought.
The Third Domino (Repeated for emphasis): Grading is fundamentally subjective. Everything I’ve described above is fundamentally arbitrary – I was the one who decided how to assign points so it would match my opinions about where grades should fall based on which problems students should do. The point system exists purely to create the facade of an objective system between me and the grades I assign so that when a student asks me ‘Why didn’t I get a higher score?’, I can blame the system. This creates a disconnect for every person involved – the student’s grade is dependent on this system of grading, and I am a slave to implementing it properly. A shift is created for both of us – we are both trying to game the points system: my student is trying to do whatever it takes to earn those extra points, and I’m trying to do whatever it takes to make sure the points system is designed so it accurately rewards points. This is fundamentally dishonest.
So I make a decision: I fully embrace that grading is subjective and, instead of using grades as a barrier, I use grades as a method of feedback and transparency. This becomes my goal: every ‘grade’ on one of my assessments should act as a message from me to my students: you’ve mastered this, you get most of it, you still need work, or you have no idea. This has always been my goal with how I assign points in the past, but the narrowness of my assessments doesn’t let me hide behind points anymore. Instead, I start to create a holistic grading rubric – I choose to let the numbers 1-5 represent how well I believe a student understands a concept. I do this mostly out of the desire to be more transparent and to provide feedback, but also because I think about what my gradebook will look like. With every standard separated, I get more clarity of data. However, this clarity from separating the standards is useless unless I also have clarity in what my grades mean. This is something that a points system is designed to obscure, but that a feedback-based rubric is designed to embrace.
Now when I grade, I find myself trying to tease out how many errors are based on carelessness vs true conceptual misunderstanding vs problem-solving blocks vs a flawed foundation that is affecting the current topic. I try to decide how each of these factors impacts the type of feedback I want to give – some standards (especially foundational ones) leave no room for carelessness, while some standards leave a little room for flexibility depending on what exactly I’m trying to measure. If a student makes a small mistake at the beginning of the problem, I follow the mistake through the rest of the problem to see if they really understand the process and problem-solving techniques. Now the assessments I’m proud of are the ones that have questions that really pull at the conceptual underpinnings of my curriculum – that have the right balance between process, explanation, and application so that I can get a truly clear picture of how well a student understands a concept.
And so, After making one ‘tiny’ change to my gradebook, I found myself reevaluating my curriculum, assessments, and grading criteria. All in service of fulfilling my goal of accurately representing how well my students are doing standard-by-standard.
The Next 50 Dominoes: At this point, I’m now adjusting other aspects of my classroom in order to deal with the other adjustments I’ve already made to my curriculum, assessments, and grading. These adjustments appear in the form of specific grading rubrics, frequency of assessments, how to do reassessments, how to assign/grade homework, how to write assessments, how I assign projects, etc. These are all things I’ve thought about, and while I like a lot of what I do, I’m not convinced it’s the best for every teacher who wants to try SBG. My recommendations about these things might work for me because I teach sophomore geometry during a year when there’s pressure to pass the high-stakes exam; there might be better ones for someone teaching middle-school science. So, I won’t say too much more in this post, but maybe I’ll share some of these things individually in their own posts. But for right now, my current SBG implementation looks closest to the one written by Jesse Wilcox called Holding Ourselves to a Higher Standard.
Anyway – I think this post is long enough and I want to be done with it. I think the point of this post is: I believe that SBG, at its fundamental level, is only changing your gradebook so you grade individual standards. However, this change forces you to face realities about a traditional classroom that you can’t ignore and that you are forced to react to. In reacting to these things, we all create our own slightly different SBG systems to address our curriculum, our content, and our student demographics. And what I wrote above are my own reactions to what I needed to change from last year to this year. It was incredibly difficult and unstable at the beginning of the year because I hadn’t yet realized just how much of my classroom I had accidentally changed. Another way to say it is: I changed the rules of the game and I was getting used to the rules at the same time my students were, which can be frustrating. However, now that the years almost over, spoiler alert: my assessments are running at max efficiency and I can’t imagine running my class any other way.
Thanks for reading. Geez this is long.
Other Readings that Influenced This Post:
Classroom Assessment & Grading That Works by Marzano
A Conversation about Grading from Jason Buell (The Bottom Few Paragraphs)