Authored by Victoria Petkovic-Short
When you see a claim backed up by statistics, how do you know if it is true?
We are bombarded by statistics every single day, and, as we explore in our blog on “the problem with statistics”, they are given far more credibility and influence than they perhaps should be. As a key tool in the spread of misinformation, and the manipulation of the masses, here’s our quick start guide to working out whether a stat is true, or a load of old hokum…
Go back to the original source
Frequently, when you encounter a statistic, you’ll have seen it in an article, on a website, or read it on social media. The trouble is they are often selecting the statistic, or series of statistics that best support the point that they are trying to make (see also confirmation bias). Shock horror, they may even be making it up! That’s why it is essential to try and trace the original source of the data to see how credible it is and whether it demonstrates the point they are making, or whether you can draw different conclusions in the wider context. A quick way to do this is to check if the article cites the original source, then go to their website, or if they don’t, then copy the statistic, and paste it into a search tool to find a resource that does cite the original.
Can you trust the source?
Once you’ve found the original data, consider how trustworthy that source is. Who has produced the data? Why have they produced it? What was their aim for doing so? An academic journal that’s been peer-reviewed is far more likely to be a more reliable source than a business down the road doing a straw poll of its customers. Understanding the purpose of the data collection, and if you can find it the method of data collection, can help you understand whether it is true and representative.
For example, becoming a first-time mum last year, I fully immersed myself in the world of parenting, reading all sorts of content to try and ensure I do a fair to good job of raising my child. Along the way, I encountered numerous examples of just how important it is to understand the origin of statistics. For example, when deciding how early to introduce potty training, we came across the “when they’re ready” movement, which suggests that your child will indicate readiness for potty training, somewhere between the ages of two and four. This movement has been cited, recommended and shared across the world, via blogs, social posts, and numerous books, but being an insatiable reader, and challenging myself to think critically, I decided to trace the origins of the study that sparked it. Cue a few days research (when I actually had spare time before kids) and I eventually traced it back to an American child psychologist who purportedly studied a large group of children and came to this conclusion. The problem was, the same said child psychologist was actually sitting on the Board of Pampers Nappies at the time he produced said study, a company that would directly profit from an increase in the average length of time a baby is in nappies. Needless to say, my faith in the study fell apart shortly after.
Do you truly understand the statistic, and can you deem it reliable?
There are all sorts of statistics. A count – the number of people / object / thing; a qualitative study – the collated opinions of a group of people; and many more besides. These statistics offer very different types of information, and it’s essential that you understand exactly what data is being collected.
Covid-19 is a perfect example of this. A quick search for the infection rate or death rate will give you multiple sources of information with apparently conflicting information. The problem is, the parameters for those statistics might be different. One study might account for all those who had reported symptoms, another might only account for positive antigen tests, while a third may include an estimate for total number of cases, based on the expected proportion of people who are asymptomatic. That’s not to mention different geographic boundaries of course. None of these statistics are necessarily wrong, but they each collate a different set or subset of data, which makes them exceedingly difficult to compare. It is therefore essential that you understand exactly what you are reading, and whether that is the information you expected to be reading, in order to understand the potential reliability.
Has it been manipulated or inflated?
As we explored in our article on the “problem with statistics”, the way statistics are taught in Western Education leads to some common misunderstandings. Something can be true and sound significant without actually being so. If we put a group of 500 people in a room and found that one person had a headache, then we played really load music and found that two people had a headache instead, we could say that loud music resulted in a 100% increase in headaches, or that it doubled the number of headaches. In reality, while these are both true, headaches still only affected two people overall, with 498 unaffected.
Is it a correlation or a causation?
Something that you will often see with statistics is a correlation being misrepresented as a cause. A correlation is a relationship between two variables and they may or may not linked; a causation is a direct relationship between two variables where one directly results in the other.
The reason this is important is that it determines whether there is a cause and effect within the data. For example, there is a satirical “Theory of the Stork” study, which identifies a causal link between the number of storks in Germany and an increase in the birth rate. It hypothesises that “because storks deliver babies”, more storks mean more babies. The study presents a cause between the two variables, when in fact there is merely a correlation (if there is one at all) – two things happening at the same time, which are in fact unlinked. Understanding the difference between cause and correlation is essential in determining if the data can be relied upon.
While not an infallible method for proving a statistic right or wrong, our rule of five guide can help you in reducing your potential manipulation by statistics.
We also particularly enjoyed “The art of statistics” by David Spiegelhalter, which explores many of life’s modern misconceptions using data, as well as “How to read numbers” by Tom & David Chivers which is a great guide to understanding statistics.