Polymath at Large: Trying to use statistics well

kw: observations, musings, statistical distributions

A simple illustration is more enlightening than a page of derivation, at least to me. I was thinking recently about the law of small numbers and its inverse, the law of large numbers. Though they are related, they have quite different emphases.

The law of small numbers embodies the observation that a sample of a few items taken from a large population, such as four balls from an urn containing an unknown number of black and white balls, is quite likely to give you a very inaccurate impression of the relative distribution in the larger population. Suppose all four balls drawn are white. How likely is it that the actual population contains equal numbers of black and white balls? Not as unlikely as you might imagine. What would you guess? One chance in ten, or fifty, or 1,000?

Suppose there are ten white and ten black balls. As you extract ball after ball, each being white, the odds for each draw are:

10/20 = 0.5000
9/19 = 0.4737
8/18 = 0.4444
7/17 = 0.4118

Multiplying these four together, we find the total probability of drawing four balls to be 0.04334, or about one in 23. If the number of balls is much larger, the probability approaches 0.5⁴, or 0.0625, one chance in 16. How does that square with your guess above?

Now, suppose you decide to draw twenty balls, hoping to get a better estimate of the distribution. If you don't know there are only twenty balls, you don't know, without trying to draw a 21^st ball, that you've taken the entire population! But if there are a great many balls, and you draw twenty white balls, you have a much better idea that there must be very few black balls, because 0.5²⁰ = 0.00000095, about one chance in a million. On the other hand, you can state that, if only one-tenth of the balls are black, 0.9²⁰ = 0.1216, or one chance in about eight. But this is not the law of large numbers. It is still making inferences from a single sample.

The law of large numbers expresses the surprise we feel when a "one in a million" event occurs, until we realize that there are billions of events occurring every day among the seven billion people on this planet: Given a large number of events, some of them are bound to be very unlikely. For this, a different kind of illustration is in order. The following four charts show Gaussian, or Normal, distributions containing one hundred, one thousand, ten thousand, and 100 thousand observations. I have them all here in a bunch, as Minitab "statistical summary" charts. Click on each to see it in more detail.

The overall impression is what is important at first. The first chart shows how a relatively small number of observations of a truly Gaussian variable add up to a rather poor fit to the normal bell curve. The fourth chart shows how a large number of observations produces a much better fit.

The standard Gaussian, or standard Normal, variable has mean of zero, standard deviation of one, and zero skewness and kurtosis. I'll explain the latter two terms in case they are new to someone.

Skewness represents an imbalance between the right and left halves of a distribution. Positive skew means the distribution is "right-heavy". Natural phenomena that exhibit positive skew often follow a lognormal distribution. The power law distribution, beloved of fractal enthusiasts, has the greatest positive skew of any naturally-produced variable.
Kurtosis refers to an imbalance between the "pointiness" of the distribution and the heavy or light tails that result. Positive kurtosis yields a super-Gaussian or Leptokurtic distribution. "Lepto" means "skinny" or "small" in Greek (the smallest Greek coin is the Lepton). Negative kurtosis yields a flattened or Platykurtic distribution (think Plate). Positive kurtosis means the tails are heavier than "normal", and extreme events are more likely than you might expect. Negative kurtosis means the tails are light and extreme events are scarcer than expected. Significantly negative kurtosis is rare in nature.

I planned to present a table of the key parameters of the four distributions, but the Blogger interface does not support HTML tables properly. So I'll finesse it this way. Each statement below gives the values of a parameter for the four data sets in order, for 100, 1,000, 10,000, and 100,000:

Mean: 0.0032, -0.0276, 0.0136, 0.0010
CI Mean: ±0.1831, ±0.0625, ±0.0195, ±0.0062

CI Mean refers to the confidence interval for the mean, as calculated by Minitab. Note how rapidly they shrink, reducing as the square root of the number of observations.

St Dev: 0.9231, 1.0070, 0.9941, 0.9975
Skewness: 0.0865, 0.0483, 0.0062, 0.0002
Kurtosis: -0.581, -0.169, 0.019, -0.033
Extremum: 2.2, 3.2, 4.0, 4.7

I know these are harder to read than a properly formatted table. The last line illustrates the law of large numbers. While a value such as 4.7 is possible for a 100-observation sample, it is very unlikely. Such a value occurs about once per 770,000 observations, so it is actually rather unlikely even for the sample of 100,000. The expected extremum for each sample is 2.33, 3.09, 3.73, and 4.27.

OK, how is this practical in my daily life? Let us consider height. The standard deviation for the height of American males is three inches (7.6 cm), and the mean value is currently seventy inches (178 cm). Out of 100,000 American men, what is the expected extremum? Take 4.27x3 to get 12.8, nearly thirteen inches or 32.5 cm. In such a sample, then, you can reasonably expect there will be one man nearly 83 inches tall (6-ft 11 or 2.1 m) and one man as short as 57 inches (4-ft 9 or 145 cm). What does it take to get someone eight feet tall (I think there is currently one Chinese man this tall)? That is an excess of 26 inches, or 8.67 standard deviations. Minitab tells me that the expected proportion is one chance in 4.6×10¹⁷. The tallest expected man out of 3.5 billion males on earth today is at 6.175 standard deviations, which is an extra 18.5 inches, or a total of 88.5 inches (7-ft 4.5 or 2.25 m).

Finally, can a study of statistics help us out in the stock market? Can a fellow get rich? See the following chart (click on it to see more clearly).

This is the daily variation for AT&T stock (symbol T) since 1984, nearly 6,600 observations. Note that the two or three central bars push way above the fitted curve, and that the tails also rise above the curve. This is classic positive kurtosis. It means that unexpectedly large deviations from the normally-fitted curve are very likely. In fact, although the standard deviation of the fitted curve is 0.76, the extremum is 7, more than nine standard deviations away!

These data are far from "normal", which gives us our first hint that the stock market is a great place to lose your shirt, as millions of investors already know. Of course, a plus extremum is as likely is a minus. The problem is, by the time one happens, you've missed it anyway, and because the stock market has a history (investors remember), regression toward the mean is more common than for a truly random variable. Study the curve above to understand the shape of investor emotion, which is what drives it away from a Gaussian normal curve: when the daily motion is small, the next day's motion is likely to be small also. A big motion might precipitate an even bigger motion the next day, but is more likely to be followed by a much smaller change, closer to the mean. Lots of "technical analysts" stay up late at night trying to fit time series estimators to stock market motions. None of them gets rich. D'you know what my richest friend invests in? Real estate. As Will Rogers said, "Buy land. Nobody is making any more of it."

Polymath at Large

Tuesday, September 21, 2010

Trying to use statistics well

No comments:

Contributors

Links

Blogs

Blog Archive