Monday, October 24, 2011

2010 Census and lognormal statistics

kw: census records, statistics, statistical distributions, analysis

Some time ago I wrote briefly about the US state population distribution in the 2010 Census, but the County data were not yet released. County data were released a couple months ago, and I just had the chance to review them.

This shows the county populations analyzed in Lognormal coordinates. While the fit is not perfect, a lognormal fit is better than any other that Minitab can analyze.

The lognormal distribution is appropriate for collections of parameters that divide up a fixed quantity using several criteria. The more "reasons" there are for division, the closer to a lognormal distribution the data will be, according to the logarithmic form of the Central Limit Theorem (CLT).

This repeats the US state information I plotted before, showing the goodness of fit within a lognormal model.

As I was thinking about this, I began to wonder why people choose to live in one place or another. In my own case, I made one choice based on the school I wanted for my graduate studies, and a few other moves based on job opportunities. One move was arranged to avoid a certain climate.

There are so many reasons, it is likely the CLT is well-satisfied. This does appear to be the case for these state and county data. I sought another level of analysis in order to see if I could tease out a few of the big factors.

This chart shows one factor and gives hints at one or two more. For each state, it plots average yearly rainfall horizontally and population density vertically. The state abbreviations help us discern part of what is going on.

With the exception of Alaska, there appears to be an upward-trending lower bound, anchored on the left by the cold, dry north-central states and on the lower right by the hot Gulf Coast states. The odd combination of NE, OR and ME fills in the middle. Moving upward from this line, I think I see two trends. One is that the states that were the original thirteen Colonies have high population densities, capped by New Jersey. Three states, CA, FL and HI, are "great climate" states. The remaining states tend to line up with warmth being an upward indicator, though there is lots of scatter.

This just scratches the surface of such an analysis. I suspect parameters such as the unemployment index and state taxation laws have their own influence; for example, I would expect Oregon to have a density closer to that of Delaware, as both have no sales tax, and they have similarly mild climates. Clearly, other factors come into play (like half of Oregon being mountainous and thus rather unlivable unless you are Jeremiah Johnson). For the moment, I've gone as far as I can without dragging together lots more data.

1 comment:

  1. Nice use of Minitab to dig deeper into the census data! The scatterplot of density vs. precipitation was a great idea, and there're definitely a lot of other factors that could be incorporated into future analyses.

    ReplyDelete