Eric Gaze is the Director of the Quantitative Reasoning Program at Bowdoin College, and also a Senior Lecturer in Mathematics. The example outlined here is an activity he uses in an Introduction to Data Visualization course; it can also be used in an Introductory Statistics course, and some Liberal Arts Math courses.
Engaging students with data requires they possess a basic facility with spreadsheets and proportional reasoning. Students are almost always presented with data in the form of a spreadsheet, whether it be a text file, comma separated value (csv) file or Excel file. Data manipulation and wrangling is typically required to get the data in a usable form for more sophisticated programs like Tableau and R. This data cleaning often involves fundamental proportional reasoning skills, from rescaling values to a common shared index value or computing percentage change. In order to prepare students to clean data, instructors need to activate students’ prior knowledge of these more algebraic topics. Knowing how to set up a proportion for 3 is to 4 as x is to 100 and solving for x is an arithmetic problem. Transferring this skill to a spreadsheet where you are taking raw numbers of births or crimes and creating rates per 1,000 or 100,000 is an algebra problem! The cell references of the values in the spreadsheet are the variables in the associated algebra problem.
In the group activity described here, Gaze uses the table of births and birth rates by age of mother published by the CDC in an annual National Vital Statistics Report (https://www.cdc.gov/nchs/data/vsrr/vsrr038.pdf). Gaze shows students a copy of the table published by the CDC from a prior year and instructs them to find the data for the most recent year. Students are so used to being handed nice clean data on a silver platter (or in a silver spreadsheet!), making this search an invaluable exercise. It is a humbling experience for these “digital natives” to struggle with what seems like the most basic task, a web search. Once the data is found students then must grapple with copying and pasting data from a PDF into a spreadsheet. Often students resort to simply typing in the data by hand. The data gives the number of births and birth rates per 1,000 women in specified age groups, starting with 10-14 year-olds and increasing in 5 year increments up to 45-54. The data also includes data for all ages but students must read the caption carefully to realize that: “Rates for all ages are the total number of births (regardless of the age of mother) per 1,000 women aged 15-44” (p. 6).
Students are next asked to use the number of births and rate for each age group to work backwards and compute the number of women in each age group. This is a tricky proportional reasoning problem requiring students to think algebraically. Now they can take the number of births for each age group and compute two different percentages: one of the total number of births, and one of the total number of women for each age group. From here students make charts of these percentages, only one of which can be represented as a pie chart. Group work is crucial here so students realize they are not alone in struggling to access their prior knowledge and apply it.
Digital Resources
Spreadsheet technology and the internet to search for data
Requiring students to work with rates in a spreadsheet activates prior knowledge in both proportional reasoning and algebraic reasoning. Students first are confronted with the difficulty of finding and cleaning real world data, a critical realization for anyone working with data. Next, general spreadsheet literacy is reinforced by asking students to enter formulas that create a new variable for use in the data analysis. Working backwards from rates illustrates the subtlety of proportional reasoning in a real world context. Students must think abstractly about the variables in question (births are to population as rates are to 1,000) to enter a formula that they can fill down in the spreadsheet in order to find the relevant age group populations. Finally, students are confronted with the importance of quantitative literacy. Understanding what the numbers in your data mean is crucial for any hope in analyzing them correctly.
Digital Enablement
Without technology, many faculty resort to using small artificial data sets that can be analyzed by hand in order for students to practice simple statistical calculations. A data set summarizing over three million data points would not be feasible for undergraduates to explore and make sense of without a digital spreadsheet. Using spreadsheets to organize and analyze real world data allows students to quickly test their reasoning, repeat a series of calculations efficiently, and create graphical displays with ease.