On Outliers: What they represent, and why the Central Limit Theorem is Typically Off.
by Danielle Fong
The central limit theorem states that if you have many small, independent, random variables, then their sum is distributed approximately as a bell curve. Strikingly, almost everything is made up of many small parts, and these parts don’t tend to influence each other very much.
So much of what can measure seems to fit a bell curve. This is why the normal distribution works. Because this assumption tends to work well, it is usually taken as a matter of course. Students are taught it, lecturers preach it, researchers apply it, and startlingly few stop to question it.
Suppose the variables are not small, or suppose they’re not independent. Suppose, under certain conditions, the value of one variable would seriously affect another. Suppose we’re talking about the buildup of snow on a mountain slope. Most of the time, snowflakes can gradually build, without significant effect. But once enough builds, you don’t find snowflakes resting calmly upon a drift. What you find is an avalanche.
The sum total of snowflake movement isn’t what we might expect. The snowflakes on the top used to be lightly packed by the new, gradually coming down. The snowflakes on the bottom used to just sit there. But they’re not just sitting there. They’re moving fast, and they’re moving down.
The central limit theorem doesn’t always hold! If you have a model where, most of the time, a change in one part doesn’t effect another much, but some of the time it really does, then you can’t assume that your outcomes will follow a bell curve. Entirely different outcomes are possible.
Traditionally, when you have some measurement that’s far outside of normal, you call this an outlier. In much statistical analysis, these are ignored or thrown out (for example, even the most extraordinary measurements won’t effect a median). This is useful if you want to study the ordinary behavior of something. But sometimes, the information you gain from outliers is by far the most interesting. Some of the outliers are expected in normal distributions (bell curves). But some of them are outliers because the model doesn’t apply. Some outliers are avalanches.
We started from an iffy assumption. Not everything is made up of independent random variables. Parts effect each other. Sometimes it’s violent, like a chain reaction, or our avalanche. And sometimes, it’s magical. Life is an example of this. If the chemicals that made up our bodies didn’t bond so strongly to one another, our DNA would unwind, and you’d be a puddle.
Statisticians try to account for this. One example on the tips of our tongues lately: finance. Roughly speaking, the ‘beta’ used in Finance means the volatility of a stock with the linear correlations with the company’s existing portfolio factored out. Despite this, there seem to be ‘six-sigma events’ happening all the time, things which, according to the theories, and ‘ordinary’ data, really shouldn’t be happening at all. 1 What’s going on?
When small effects just add up simply, you can model them by what’s called a linear model (so called because if you add up small things along an XY graph, you’ll get a line). This doesn’t always work, and in fact, most interesting phenomena are non-linear.
Models are limited. They break down. And one can’t really account for every possibility. Financial markets can collapse due to dustbowls, and furthered by widespread investor panic. The destructive power of armies during world wars can be dwarfed by exhaustion and a powerful flu, though normally flu is beaten with chicken soup.2 And planets full of ten story tall reptiles can be wiped out by meteorites. No number of small rocks would matter — usually they just ping off them. Bet they didn’t see that coming.
It isn’t too long before one fears running directly into Godel’s Incompleteness Theorems: checking all assumptions isn’t merely inconvenient, for most interesting problems, it’s impossible.3
The next time you hear about your monthly six, or seven, or eight sigma event, keep this in mind. Outliers are where the model breaks down. They happen more often than standard models would expect, and they often point out problems in the understanding of the system. If they start happening all the time, start mistrusting your statisticians. And your wall street.4
One can show the recursiveness of the model verification problem. Suppose one says to you: drug x worked better than drug y, with a 95% confidence. You might reply: what’s the confidence of that confidence? Somewhere along the line, they’ll have to say 100, and then they need to prove consistency. One can show that there are theories with unknown external variables and relationships (as are dealt with in statistics) aren’t even formally recursively enumerable,5 but even if they were, no such theory could contain a statement of its own consistency.
Back on planet earth, is there anything we should be worried about? Some who talk about the climate claim that, since basically the temperature of the earth just goes up and down naturally, it’s not really anything to worry about. But outliers can mark where models break down. Where systems change. This is a chart of temperature over the past few centuries. You may notice, to the right, an outlier. Maybe it’s telling us something.6
Further Reading:
1 – The Psychology of Human Misjudgment – Charlie Munger
“Now let’s talk about efficient market theory, a wonderful economic doctrine that had a long vogue in spite of the experience of Berkshire Hathaway. In fact one of the economists who won — he shared a Nobel Prize — and as he looked at Berkshire Hathaway year after year, which people would throw in his face as saying maybe the market isn’t quite as efficient as you think, he said, “Well, it’s a two-sigma event.” And then he said we were a three-sigma event. And then he said we were a four-sigma event. And he finally got up to six sigmas — better to add a sigma than change a theory, just because the evidence comes in differently. [Laughter] And, of course, when this share of a Nobel Prize went into money management himself, he sank like a stone.”
2 – Wikipedia on the Spanish Flu.
3 – For the curious, a quick way to show this is by adding a ‘randomness’ oracle to a turing machine, and showing that you still can’t solve the halting problem.
4 – Wikipedia on the 2007 subprime mortgage financial crisis.
5 – As wikipedia states, a recursively enumerable language is a formal language for which there exists a Turing machine (or other computable function) which will enumerate all valid strings of the language.
6 – (update) For further musings on uncertainty and randomness, a terrific book is Nassim Taleb’s The Black Swan. A related essay is available here. Also intriguing is the working paper “Extreme Sample Selection Bias: Conditions that Cause the Correlation Between Two Variables to Switch Signs” by Tim Groseclose
The first reaction after reading this is a smile on my face :)
I graduated in Mathematics and I am working as a decision-scientist for a leading FMCG.
That should explain the smile.
Yeahbut the central limit theorem says something about populations. You’re basically like a small child told that the average height is 5 11, or whatever, who goes, yeah but my daddy’s eight foot tall. It’s not meaningful in terms of the population that there is an eight-foot-tall man.
Dr Zen,
The central limit theorem might be about the averages of populations, but if it’s about averages we can multiply the average by the number of elements and arrive at a sum, and then the sum can be interpreted as the result of a series of linearly independent influences.
Have you read Per Bak’s “How Nature Works”? The book has a very interesting perspective on outliers and the impact they have, approaching the problem from the perspective of inverse-frequency laws, power laws, and Zipf’s law.
Like Chief Sitting Bull, Tom Paine
Dr. Martin Luther King, Malcolm X
They were renegades of their time and age
The mighty Renegades
(RATM)
From a human perspective, renegades (or outliers) are absolutely essential to the advancement of the human condition. Without people willing to challenge the status quo, society would be static. And even if you’re on the right tracks you’ll still get run over if you are not moving forward.
Great essay. In the short term experts of the day hold sway and linear interpretations are in vogue or — or simple cyclical models. Outliers aree often the key to bigger processes and timeframes. As you note, Taleb has written on this for quite a while with an insider’s and outsider’s viewpoint. Futurebabble is a good book on the fallibility of experts: foxes and hedgehogs.
The outliers in the temperature graph tell me that splicing proxy data and instrumental data are a good way to scare people (see here). Do you know which reconstructions these are, and whether or not they use the same proxies all the way through or if they graft the instrumental data on the end?
Check out Richard Muller’s work. The most rigorous exploration of possible statistical anomalies show little difference from the ordinary analysis — the results are robust.
http://www.nytimes.com/2012/07/30/opinion/the-conversion-of-a-climate-change-skeptic.html?pagewanted=all
Just because the assumptions of the CLT are not met does not mean that the CLT does not hold. If the CLT’s assumptions are not met then the CLT’s conclusions may not hold, but the CLT still holds.
Ms. Fong,
Your article is correct in that the central limit theorem does not work well in models in which the data are not entirely independent. However, the correct application of the theorem is on a model in which the data are negligibly dependent (or, ideally, entirely independent) and the values of the data tend toward a mean value. The Gaussian normal distribution accounts for outliers in the long tail on either end–outliers are outliers because they do not happen much, and the bell curve represents the frequency of occurrence of the value of a variable in a population.
No, there does not have to be a 100% confidence at any point in the line; given a sample of data that roughly follow a normal distribution, there is only a five percent chance (assuming that the population does follow a Normal distribution) that the population mean is more than two sample standard errors away from the sample mean simply because that is the probability that a sample such as the one found would occur given a mean outside that range and a standard deviation different from the one in the sample. The only certainties are the values of the sample data. There need be no certainty unless one deals in Bayesian statistics, in which case the normal distribution is of no use anyway.
The reason that predictions do not work is that they are extrapolation, which is a mode of statistical analysis that has very little strength because most distributions change over time if they are worth prediction.
The strength of a statistical test is also a factor–a very complicated one. However, strength increases with sample size.
Thank you for your time and patience in the course of reading this comment. You, being human, probably needed it.
–Anonymous
P.S.: The verb meaning of “effect” is “to bring about.”