Analyzing Raw Election Data
Hey #MTBoS – long time, no see. I wanted to stop in and share something kinda cool I did with my AP Computer Science Principles class (wait – what?)
I’m teaching a brand new course this year, AP Computer Science Principles. I’ve mostly been following the curriculum provided by Code.org, which has been excellent – I dig their philosophy of providing open Creative Common licensed resources to benefit everyone, and I’m totally bought-in to their underlying principles of equity and ‘this is not just another coding class’. One of the big ideas of the course is Big Data – the idea that computer scientists manipulate and transform data into something presentable and look for actionable patterns or trends.
I had been looking around online for different ideas of how to address Big Data and, frankly, I wasn’t satisfied with what I was seeing. Most places suggested having students create a survey, have lots of people take it, then look at the data and perform some analysis on it to identify trends and patterns. I disliked this for two reasons, both of which come from my experiences as a math teacher and being acutely aware of psuedocontext – wrapping up a task in an inauthentic experience. Since the survey is a means to analyze the data rather than the true focus of the unit (as it might be in a statistics class), this almost necessitates that it be superficial and quick and and probably won’t lead to any truly meaningful insights – not great. I also didn’t like that a ‘large’ survey done this way would have maybe 100 data points, which isn’t anywhere near what a truly ‘large’ data set is in the computer science world.
If I was going to do this unit, I wanted students to look at real raw data on a scale where it is only feasible to use a computer to analyze it and whose analysis could provide real insights. So, I went around looking for raw data sources and found this Forbes article that pointed me to a lot of good places, but it wasn’t until I found FiveThityEight’s Elections page that I really got excited.
FiveThirtyEight has a reference to every poll it uses in its model. One of those polls is a Google Consumer Survey, which opens automatically in Google’s DataStudio (brief pause: HAVE YOU SEEN GOOGLE’S DATASTUDIO?!?! Let’s talk about this more at the end of this post because oh man oh man does it look cool). The survey has 4 questions: how likely are you to vote, who would you vote for, what’s your gender, and what’s your age range. At the bottom of the page is a link to the data used to generate that model – but, even better, is a link to all the historical data going all the way back to August 2016. Each .csv file is a 20,000+ entry file that I could open in Excel and start playing around with.
And this is when I got really really excited because I had, at my fingertips, real, raw, meaningful data about an event that was real, raw, and meaningful for my students. I had a vehicle for students to use a computational tool to become more critical citizens and have a meaningful interaction with data that wasn’t superficially imposed by me – analyzing whether more girls or boys like dogs versus cats sounds pathetic when placed next to predicting who will run our country for the next four years. And suddenly this Big Data unit became one of the most exciting things I was doing with my students.
Jumping ahead a bit, here’s what they came up with (flaws and all). The assignment was to pick two consecutive data sets and find a story to tell about them. We did this towards the end of September, so most picked dates around then (which happened to be before and after the 1st presidential debate). I showed them how to create pivot tables and charts (more on that below), then had them work in pairs and choose a state to focus on. They put their analysis in a shared Google Slide so everyone could see each others work. Even though this was the finished product, this isn’t where I started – there were a few days that led up to this:
On day 1, I wanted to build up the idea that visualizing big data is important as a way to understand and communicate the data, and no one is better at making that argument than Hans Rosling. I showed the first half of his 2006 Ted Talk where he looks at 3rd world vs 1st world using Gapminder, then I showed a minute or so from the end where he gives a mini call-t0-action regarding connecting big data to visualizations so it can be communicated to the world. We looked at some really terrible data visualizations – I got mine from Code.org, but you can find them anywhere – then we visited Gapminder World and I had them pick one of the example graphs to explore and pick their own. One really cool thing about Gapminder is all of their data is available for download, so we grabbed the Population Data with Projections, imported it into Excel, cleaned it and filtered it, then picked two countries to compare the populations. Pretty full day – here’s what they came up with.
On day 2, I explained the Netflix Prize as a way to introduce the next data set we’d look at: movie ratings with demographic info. Code.org has a lesson based around analyzing a subset of this rating data, but they also have entire large data sets available for download which is really awesome – again, I’m a big fan of their open sharing philosophy. We looked at the smaller subset first and I showed them how to make pivot tables in Excel so they could average ratings by gender or age, or look at the distribution of ratings for a particular movie, etc etc. Lots of stuff to play with. We ended by looking at the full data set – they had to pick at least 5 movies and analyze them in some way. Here’s what they came up with.
From there, we dived into the Election Data. I gave them the files, told them they’d need to use the skills we developed in the last few days, then set them loose. The only real requirements were to work together and try to find a pair of graphs that you could use to support a statement about a change in the voting pattern for a particular demographic. We spent a few days working in-class, and still some groups weren’t able to finish. The freedom to play with the data and choose something that interested them led to some great conversations and lots of engagement – I was happy with how seriously they took the assignment. I also really liked everyone having a common Google Slide for students to post to – its something I’ve started doing with other aspects of this class and its worked out really well so far.
In retrospect, I wish I had narrowed the states students could choose – one group picked Vermont and, for some reason, barely anyone in Vermont was polled (I think it’s less than 30), so it was difficult to make a bigger statement because the sample size was so small. I also wish I defined my expectations a little better – ultimately I added the checklist at the front of the presentation, but that wasn’t there to start with. I would keep the overall vagueness of the prompt – ‘make the data tell a story’ – but I would have some exemplars or examples to show, similar to how Gapminder has a few examples students can see before playing around with the data themselves. Sentence frames would also help students form the type of written response I was looking for, but even that can subtly restrict what students look for or find important.
So – all of that is the gist of the project. Here are a few more random thoughts:
- Exploring this project also led me to discover Google’s DataStudio, which looks really awesome for creating visual analytics for data. I went through their tutorial of linking data to their dashboard widgets and its pretty powerful. If I had more time, this is definitely where I would push students to develop their skills.
- Code.org has a link to this nifty style guide on data visualizations, which was really great to show students on the second day of doing the election slides.
- There’s still 10 days left in the election, so you’ve got some time to try this yourself if you’d like. I highly recommend it if you’re teaching Computer Science Principles, and maybe even if you’re not for the reasons below.
One Final Note: I just got back from NCTM Phoenix where I was lucky enough to attend Karim Kai‘s session on the value of application problems in a math curriculum. Part of his argument [paraphrased by me] is that a true application task starts with mathematics as its premise, and uses that to understand the world (rather than using the world as its premise and using that to understand mathematics). We went through this Mathalicious lesson which was really awesome, then he answered some questions about ways to implement these types of lessons on a teacher or school level.
One comment that’s really stuck with me was his appeal that doing more true application problems has the potential to create a better informed student who is an active, rational, thoughtful participant in our society. I’ve been thinking about this as I’ve been reflecting on this task: the goals were a bit vague, we spent less than a week on it, and we’re going to move on to something completely different afterwards. These are all the things that usually swing the pendulum towards “better to skip it”, which I almost did (remember that? 3rd paragraph in this post?). But, I’m glad I didn’t: in the moments where my students were analyzing their data and trying to find the story within, it felt like something bigger was happening. That I was engaged in conversations that were bigger than the little universe that lives inside the four walls of our classroom, that the energy level and thirst to know more was amped up a little more than usual, and that these conversations were continuing beyond our walls with people whom I’ve never met. And this made the lesson valuable in ways I didn’t expect, probably because I could see Karim’s point being acted out right in front of me.
So, I guess I’m saying: if this fits into your class in any way – sorting and filtering data, making visualizations in Excel, making Pivot tables – I’d highly recommend giving it a try as a true application task.
So – there’s that. Thanks for reading.