Disclaimer and request for attribution: This image was sent to me via text message from a friend, and I unfortunately do not know the original source of this image. If you know it, please let us know, so we can give proper attribution and credit where credit is due! In the meantime, we are posting it here so we can give commentary, critique, and use it for educational purposes.
A friend sent me this image via text message, with the comment, “using statistics to lie… y axis is hilarious.”
It took me a minute to get my bearings when I reviewed this chart. I thought to myself, “What is wrong with the y-axis? Oh. OH.” And proceeded to break down in giggles and expletives.
What’s wrong with this image?
When I originally wrote this post, I thought that the author of this chart should be applauded for a level of creativity akin to some sort of “evil genius.” In fact, I had the words “evil genius” in the working title. Fortunately, fellow Acorn Analytics team member Amber pointed something out that made me pull back on my harsh criticism of the author. We’ll get to that later on, though. First, let’s look at this amazing image!
Given that there is a news channel logo in the bottom left corner, I assume that this chart was shown for a very brief period of time during a news presentation, and some editorial commentary or summary was made about coronavirus cases.
I feel like the only thing that seems to be spreading faster than the virus itself are the very many charts and diagrams associated with its data. This is a problem likely because a) the data is public, b) the data from different regions and countries are not necessarily consistent in how it was generated, c) the pandemic and the fear around it is addictive to report on, and d) many of those with the skills to create charts are stuck at home with extra time on their hands.
Data scientist are drawing outside the guidelines
In fact, some of the public repositories for information on the coronavirus are linking to guidelines, such as Tableau’s “10 Considerations before you create another chart about COVID-19” found here. These considerations were written by Amanda Makulec.
As both a Masters of Public Health and the Operations Director for the Data Visualization Society, she’s an expert in the responsible use of data visualization for public health. She will be helping the Tableau team identify data resources, curate visualizations, and ensure that what is available through the hub is of the highest quality and consistent with responsible information sharing during a critical time.
Unfortunately, it seems that violating recommendations from Makulec and Tableau is not limited to amateur data enthusiasts. It appears that a broadcast television news source created the chart featured above, despite the perception that they can be held to a higher ethical standard and maintain that standard thanks to access to experts and resources that normal people lack.
Here are a few of Makulec’s considerations that the chart author should take a closer look at:
- Case numbers are the most readily available, thorough, routinely updated data sources, but that doesn’t make them simple to visualize. […]
- Aggregations and calculations that can be done with the case data are not necessarily what should be done with the case data. […]
- Visualizations should inform and be honest about what isn’t represented. […]
- Make thoughtful design decisions. […]
- Consider the human side of what you create.
- Consider how visualizations can impact (and encourage) social responsibility as we see COVID-19 in our respective communities.
Why this matters
Getting back to the image, what is all the fuss about the y-axis? Here is what I see.
At first, the chart looks normal. Let’s start with the x-axis. You will notice that it is a time period from March 18 through April 1, and the dates are evenly spaced, and the bottom is labeled to show each date including March 18 and April 1. This is to be expected.
What’s unexpected, and even extremely surprising is the y-axis along the left-hand side. Unlike the x-axis on the bottom, it does not increment evenly.
At first, it looks like it will increment by 30 cases, starting at 30, then 60, then 90, then… 100? Wait. That’s not right. Why are we only incrementing by 10 cases now? Is this the new pattern? No, because it starts again incrementing by 30, going to 130, then 160, then 190, then… 240?
We just jumped by 50 cases. Then the next line adds another 10 cases, bringing us to 250. Then the pattern stabilizes, increasing by 50 until it maxes out at 400.
If this were the mad drawings of a child that didn’t know what they were doing, I would think that they just got creative in labeling an evenly-spaced piece of paper. However, each horizontal line is evenly spaced, giving the illusion that the y-axis is something normal.
This. Is. Not. Normal!
One reason that I find this extremely surprising is how difficult it is to create such a manipulation. Personally, I’ve been creating data visualizations and working as a professional data scientist for a long time. As a data viz professional, I have a wide array of tools for creating charts and graphs. However, I cannot think of any of them that would allow me to make such bizarre changes to the y-axis—at least not without significant tweaking under the hood.
That’s why I am not even sure how this chart was created. I cannot think of any software tool that includes a “feature” that would allow me to manipulate the y-axis to look like the above.
Does this author of this chart have some special piece of software that allows them to manipulate the y-axis so that the news can tell the story they want to tell?
The only thing I can think of right now is that some original chart was created, and then imported as an image into a graphical piece of software, and then the image was manipulated and treated as something other than a chart.
If you know how this was accomplished, can you please let me know? I’m dying to try to recreate this chart, but I don’t know where to begin.
In the meantime, what was it that the author was trying to hide by changing the y-axis? That is where this story gets even WEIRDER. That’s coming up in next week’s blog post!
- Better Remote Work Tips
- Calibrate Your Moral Compass | Ethics in Data Science Part 3
- Uncle Bob’s Volkswagen | Data Science Ethics Part 2
Mike Zawitkowski is a full stack data scientist. He has worked with big data and machine learning problems since 1999—before “big data” and “machine learning” were trendy.
If you want to work with the Acorn Team fill out the Contact Us form.