The Buzz: Correlation does not imply causation, and so all statistics showing correlation should be shunned and ignored forever and ever.
The Anti-Buzz: Correlation does not imply causation. And you’re probably using the word “correlation” wrong whenever you bring that up.
I can’t just talk about computers all the time. So much of what excites me about them is theoretical; it was inevitable that I started talking about math instead of machines. This will be my first attempt at anti-buzzing about numbers, and dentists should be very concerned about numbers because apparently 4 out of 5 of you recommended a particular toothpaste/toothbrush/mouthwash in clinical trials and well, they can’t all be the favorite brand of 80% of the dentists now can they? As a subset of the medical community, I shouldn’t have to tell you about how statistics become misleading.
When it comes to science and medicine, there is a growing epidemic of non-reproducibility, and to make matters worse, the media likes to overreact to everything science-sounding. Nutritionists seem to revise the rules every year, (eggs are good, eggs are bad, cholesterol is good, cholesterol is bad, cholesterol is bad but eating cholesterol doesn’t raise your cholesterol), and the dark heart of the beast is that, well, a lot of clinical trials of any stripe are never reproduced. Reproducing results makes for a boring read, both inside and outside of the scientific community. Pop-journalism is in the business of selling narratives, and scientific articles are usually puff pieces based off old research that has been stashed away to fill-space-yet-still-sound-relevant. (Advice for life: Don’t read any article that begins “Researchers at [Name of Institution] discovered/found …” Double points scored for ignoring articles that describe “research” derived from having college students fill out a questionnaire).
The story isn’t that science is dishonest, but that pop-science tells a remarkably inconsistent story. It doesn’t help that people want to believe that large impersonal institutions are wrong. We love stories about how studios passed on Star Wars, publishers passed on Gone With the Wind, professors ridiculed the business model for FedEx, and CERN overturned the laws of physics. When it comes to science and statistics, people are more or less able to just grab whatever seems to reinforce the opinions they already have. Similarly, they use the uncertainty of science and the potential for misuse of statistics as a smokescreen to refute anything that supports the opposition, (“Those statistics are wrong because sometimes statistics are wrong.”) And this isn’t a political statement because everybody does it.
So when my articles becomes less computer and more science, righting common abuses of math are my primary motivation. It will pay dividends to you because you are in the business of making smart decisions about technology with the information presented to you, (and then making smart decisions with the information given to you by your technology). Also, when robots start diagnosing patients for you, it will be good if you are less cynical about it – it can be easy to dismiss new technology as just another large-institution-mistake.
So, this week’s math rant is brought to you by that old chestnut that everybody seems to spout whenever they want to draw conclusions from pop-science data: Correlation does not imply causation. It doesn’t! And what’s worse is that nothing short of time travel can truly imply causation. It is also good that the general public seems to latch onto this logical truth, because observing correlations is probably the easiest way to lie with statistics.
If it weren’t for our armchair scientists shouting down every would-be correlation we’d live in a world full of people who believed that ice cream causes drownings and cancer causes Hollywood. On the other hand, it is sort of annoying that armchair scientists like to shoot down every correlation as the devil’s lies. A lot can get glossed over in a huff, so here are some things to consider.
No Correlation, No Causation: Causation does imply correlation, so if I can show that no correlation exists, I can argue that a cause does not exist. Armchair scientists love this one. Taking the whole world at once, the vast majority of deaths are not caused by plane crashes. Similarly, the vast majority of air travelers are not killed in plane crashes. It would seem there is no correlation at all, at least according to the armchair scientist, therefore plane crashes don’t kill people. Armchair science in action! A lot of mistakes can be made when drawing conclusions from data in the wrong scope. Taking the entire world’s population is not a good way to demonstrate anything about airplanes. If you limit your attention to plane passengers, and track which ones die and which ones don’t, and note whether or not the plane crashed, I’m sure you’d show a strong correlation between plane crashes and death. Ugh, let’s move on to something less gruesome.
Lesson: Consider the scope. A lot of data you are fed is at a scope inappropriate for demonstrating anything useful.
Sometimes Correlations are Coincidences:
The now-classic example is that global warming got worse as old-fashioned sea pirates went out of style. Aside from broad historical trends, there really isn’t a connection between the two. Your intuition can easily sniff out silly arguments such as this. NFL attendance went up at the same time cell-phone ownership did, for example. Not all coincidences are so obvious, however. Nobody ever points out that serial killers typically eat breakfast, but they’ll observe any other normal habit in them if it tweaks them the wrong way, (They listen to rock music! Rock music must be bad!) Coincidences are in fact not really correlations at all, but people go wild every time they see two variables they care about make similar movements. Alarmists want to argue for cause and the armchairs shout “correlation does not imply causation” when what you are talking about isn’t even a correlation. Youth violence plummeted as video game revenue skyrocketed. If you feel strongly about any real-or-fictional connection between violence and video games, you are less likely to consider that this might just be a coincidence. A good deal many other things are likely responsible for the reduction in youth violence, and similarly, video game revenue was probably not suffering under the jackboot of adolescent crime. Revenue is also a particularly bad statistic; revenue only goes in one direction when population and prosperity increases. Broadway revenue hasn’t decreased even though it is an “obsolete technology.” More people with more money buy more things. Comparing video game industry revenue with anything tells you basically nothing.
Lesson: Just because variables coincide does not mean they are related.
Sometimes Data is Irreproducible:
Sometimes bad data sets are all we have available. From a purely statistical point of view, it would seem that there is a negative correlation between pirates and global warming, but we can’t very well go testing this hypothesis because we can’t just re-institute large scale global piracy. Again, I’m using silly examples, but very often we see data that describe large populations over long periods of time. While these things might demonstrate trends, you can’t just do the industrial revolution or World War II over again to see if you can replicate a possible correlation. A lot of “correlations” sneak by because we can’t technically prove they aren’t correlations, but any scientific claims that require time travel to demonstrate are sketchy at best. And by history I mean any chart that demonstrates something “over time.” Ignore all of them.
Lesson: Stats about history provide a good picture of history, but nothing more. (I’m ignoring regression analysis here, I know).
Empirical Science Tries to Show Correlations:
The flip side of believing bad statistics is skepticism for good statistics. A true correlation almost always means something. Shouting “correlation does not imply causation” at a carefully controlled experiment is just rude. Any number of things might be wrong with the research in question, but the entire point of empirical testing is to try to create an environment where a correlation has a high likelihood of implying causation. Science isn’t above criticism, not even from the peanut gallery, but shouting “your science is bad because sometimes science is bad” isn’t very useful.
Lesson: Don’t be too skeptical.
Correlations are Important:
Even when they have nothing to do with cause. A good deal of intelligence is knowing how to make good guesses based on available information. You want to know when two things are strongly linked, regardless of the causal relationship. If you live in a dry climate and work in a secret base underground, and I asked you to guess whether or not it was raining outside, you’d probably guess that it wasn’t because that is the safe bet in a dry climate. If three people carrying umbrellas trudge into the office, and then I asked you again, you might change your mind. Does this mean that you think umbrellas cause rain? No, it just means that you think the two coincide a lot. If you are presented with a strong correlation, and you dismiss it with “correlation does not imply causation” you might be missing the point.
Lesson: Consider all the implications of the information.
And so I will wrap up with a final litmus test for navigating pop-science: if you are reading science, and it isn’t boring or difficult, then you aren’t reading science.