Anti-Buzz: Counter-Intuitions

newface-620x461(Repeat from August 2013)

Having given you a slew of practical articles, and having just returned from the very nerdy experience of a data mining conference, I’m inspired to go all math geek on you. I should really say stats geek as both mathematicians and statisticians want you to know that they are very different. I’ve danced around the topic of how people have poor intuitions about statistics and probability before, but I’ve never really gone down the list. Some of this you will have heard before.

Events are not independent – this example is trotted out so often it is becoming less true. As you know, it is unlikely that you will flip heads on a coin four times in a row, but if you flip heads three times, the odds of the fourth flip coming up heads are the same as they ever where: 50/50. Somebody who thinks tails is “due” is guilty of this fallacy. Believing that something karma-like has a hand in random trials is superstitious at best. That said, people are still guilty of this more often than they might think; we still think “luck catches up to you” and that doom awaits those who have enjoyed good luck so far.

Events are always independent – however, thanks in part to the above being such a widely touted phenomenon, people have developed the wrong intuition in the other direction. The above is about independent events – if I win the lottery and am in a car accident on the way to collect my winnings, my odds of surviving that car accident are not influenced by that great stroke of luck in winning lottery. The superstitious impulse is what drives people to believe nothing is independent. The impulse to look smarter than superstitious people is what drives everyone else to believe that everything is independent.

The odds of flipping two heads in a row is 1/4, but if I tell you that at least one head was flipped, the odds of two heads is 1/3. If that boggles your mind then I suggest that you don’t wag your finger at every “superstitious” person who thinks some events are dependent on each other. Because sometimes they are. The classic example of everybody getting this wrong is the Monty Hall problem.

Randomness is uniform – the common use of the word “random” or “randomly” is almost always referring to a very specific kind of random: uniform, (and discrete), randomness. If a company randomly selects people for drug tests, the assumption is that everybody has an equal chance of being selected, but “randomly” does not inherently guarantee such a process. I could devise a system that has a 90% chance of selecting employee A, and the other 10% is evenly spread across the rest of the company. I am still randomly selecting an employee, but it is certainly not what most people mean when they use the word “random.” This semantic distinction, however, is just the burden of a statistician, sort of like how physicists cringe with ever Star Trek warp jump; There’s no use being a total buzzkill about it. Still, the popular bias toward thinking randomness implies uniformity can lead to armchair statisticians completely misinterpreting reported statistics or models.

Speaking of uniform randomness, if you want a puzzle, work out why you can’t uniformly select any integer at random, (from all of the integers, that is – of course you can uniformly select integers from some range; a six-sided die proves this much).

Unlikely events shouldn’t happen – Whenever something unlikely happens, the intuition is to disbelieve it, or rather, disbelieve that luck had anything to do with it. This is actually reasonably healthy, as you can sometimes suss out a lie with it – “Oh, your son just happened to win the raffle you were running?” – and what most people don’t realize is that they are exhibiting an understanding of conditional probability: What are the odds that the organizer’s son wins a fair raffle? Same as everyone else. What are the odds that he wins it if the raffle is rigged? Much higher. If you are suspicious of the outcome, it is because you worked out some conditional probabilities and then understood that a certain set of conditions made the outcome you observed more likely. In this scenario, there is another probability to consider: how likely do you think it is that the organizer rigged the raffle? Most people miss this step. When taking everything into consideration, sometimes the scenario that seems more likely at first blush is actually ridiculous. Disbelief of unlikely events leads to paranoid conclusions.

Going broader, people tend to feel shocked when something extremely unlikely happens. The fallacy is the belief that only likely things “should” ever happen, but in the long run, this is silly. I scare-quoted “should” because that language implies some sort of karma-like justice that balances probability in the end. It is, in other words, superstition. The odds of nothing strange ever happening are very low. These sisters, for example, are on the receiving end of a very unlikely coincidence, but civilization wide, the odds of it never ever happening are even slimmer. When the great cosmic coincidence happens to you, you can feel lucky if you want, but the odds were that it was going to happen to somebody.

The mean is meaningful – The mean, (or average), is pretty much the only important statistic that people understand, and it is an important and useful statistic. But it is pretty much the only statistic that people ever see in discourse, and it often tells a very incomplete story. Let’s say you hear a news story about “the average person” – “the average person consumes 3,770 calories per day”, “the average person uses 23.6 rolls of toiler paper per year”, “The average family has 2.3 children.” I think you can see that those numbers have varying degrees of accuracy, even though they are all the correct average. The latter somewhat famously got the public to understand the unhelpfulness of the average; nobody has 2.3 children. We might assume that the toiler paper estimate accurately describes most people, while the caloric intake probably does not. Collecting large amounts of data on a population usually tells a much more complicated story than the simple average will tell you.

When I taught entry level computer science courses, the students mostly fell into two groups: those who “got it” and aced the course easily, and those who didn’t, and failed. The distribution of final grades was nothing like a bell curve at all. There were an awful lot of As and an awful lot of Fs. The “average student” got a C, but the C student was really an outlier. The “average grade” did not tell a useful story about those classes. And if you compared them to more normal courses that also had a C average, you might conclude they were of similar difficulty, and you would be completely blind to the nuances of how those courses were difficult.

The average isn’t wrong it’s just often very incomplete and yet we form our politics and our opinions of our society on “statistical reports” that really just amount to tracking the mean.

To end on a practical note: In business, the mean can be misleading, especially when measuring customer experience. The “average visit” to your office is probably very good, and you need it to be very good, but is the average visit the one you should be concerned about? It’s the patient visits that go wrong that should make you worry. Those are the ones that lose customers and generate negative word of mouth. If you focus only on improving the quality of the average visit, you will ignore the impact you are having on the negative outliers. You might improve the average while actually increasing the number of bad experiences you create. The average visit does not tell the story of where you need to improve, especially if your average visit it already very good. Focusing on improving your worst experiences will address the real problems and, incidentally, it will also improve your average.

Leave a Reply

Your email address will not be published.