Monday, July 29, 2013

The hand that rocks the cradle…

kw: book reviews, nonfiction, sociology

Y'know that T-shirt that says:
If Mama Ain't Happy
Ain't Nobody Happy!
I wish it were truer than it really is. A colleague of mine (now deceased) once said, in his best Bill Cosby voice (though I don't know if the Coz ever said it), "A woman's power lies in saying, 'If you don't, then I won't!'." There's just one trouble with this. Powerful men seldom love any woman enough to care what she may think. Worldwide, all the really powerful men are either adulterous (often rampantly so), or have a harem, in societies that expect that sort of thing from sultans and princes.

I just finished reading The Athena Doctrine: How Women (and the Men Who Think Like Them) Will Rule the Future, by John Gerzema and Michael D'Antonio. In their Introduction, they present ten virtues that women have (or seem to have) in greater measure than men, and state that these virtues comprise the Athena Doctrine, a wiser way to run governments, businesses and families. They illustrate these virtues and combinations of them with stories from ten countries or groups of similar countries. They close by promoting a balance between traditionally male and traditionally female approaches to problem solving and management.

I agree with their thesis. In fact, I'd state their subtitle thus: …How Women Must Prevail if We Are to Have a Future. But I have to say that the book needs to be redone, probably by a married couple of long standing (and, sad to say, in today's America I need to specify that I mean a man and a woman, long married to one another, with equal responsibility in producing the book). The stories are potentially fascinating (each chapter has from about 4 to about 10), but the writing is dull and couldn't hold my attention. There is just a little bit of a fawning note, as if the authors were "women wannabees". I wish it were a better book. Its message is much needed.

Tuesday, July 23, 2013

You don't have to hate statistics

kw: book reviews, nonfiction, mathematics, statistics, popular treatments

Measure something. Say, take a yardstick and measure the width of the kitchen counter. In my kitchen, I get 24 inches. That is an observation. Guess what? You can't do statistics using one observation. Not because you are somehow incompetent, but because of the way statistics is defined. A common definition is:
Statistics is the practice or science of collecting and analyzing numerical data in large quantities.
Note the final qualifier: "in large quantities". It is possible to do a certain amount of statistical inference using just a few items—and we'll do some momentarily—but you typically need lots of data to produce a robust inference. However, a few principles can be seen by analyzing just a few observations. I measured my counter in five more locations. Here are all my observations:

24
24 1/8
23 7/8
23 7/8
23 3/4 (= 23 6/8)
23 5/8

We can do a few things with these six numbers. First, comparing the largest with the smallest, we see that the range is 3/8 (just under 1cm). I can take the average, which comes to 23 7/8. Hmm; if the building plans specified a 24 inch counter top, this one averages an eighth inch too narrow. Then there is a trend. These are in order, from one end of the counter to the other. The largest measurement is the second one, the smallest is the last, and the rest of the measurements follow a decreasing trend. In angular terms, a "tilt" of 3/8" in about 10 feet is only a sixth of a degree, but I'd expect a builder to do better than have "nearly a half inch" of variation over ten feet. Oh, well. One of my projects for later this year is to replace the counter tops anyway. I hope quality control has improved since these were installed in the 1970s!

Now for just a little terminology. The "average" I figured is known at the "mean". It is not the only way to determine "central tendency". Another is the "median", which means, the one in the middle (or the average of the central two if the sample has an even number of observations).  For example, if I sort these six numbers (in this case, just move the 24 1/8 above the 24), it happens that there are three that are 23 7/8 or larger, and three that are 23 7/8 or smaller. So the median is 23 7/8. This is not always the case, and perhaps it is not even usually the case. For example, if I have the seven numbers 1, 2, 3, 5, 8, 14, 30, the mean is 9 but the median is 5. Note that only 2 of these numbers are greater than 9.

Another such measure is the "mode", which means the most likely value. Mode is really not too meaningful when there are only six observations, but for these data, the mode is also 23 7/8. Suppose instead that I had measured that fourth width as 23 3/4. This would have very little difference on the mean (23 6.8/8) or the median (23 13/16 or 23 6.5/8), but the mode would now be 23 3/4, because that number arose the most frequently (twice).

This illustration shows how these are related (Image from The Daily Dongle). A frequency plot of a very regular set of measurements such as shown in (a) will have mean, median and mode that are equal or nearly equal. Sometimes we make measurements that have more than one "hump" (their distribution is called bimodal) as in (b). But (c) and (d) show two ways that a series of measurements may reveal a skewness, in which case the three measures will be quite different.

Each has its uses. Average height of Euro-American males is best described as the mean, the numerical average of all measurements. We might also surmise that the median and mode will be very similar to the mean. But if you include Euro-American women, the bimodality may not be too evident, but it is there. At the very least a frequency plot will be flatter on top and have a wider total range. If the average male is 70" tall and the average woman is 64" tall (for Euro-A's, anyway), the grand average will be 67", but that single number tells you less than the two numbers, segregated by sex.

What about yearly income, or prices of homes in a city or county, or the whole country? When you hear a Real Estate report on the radio, you will hear, for example, "Median home price has risen by $5,000 in the past month". Why not use the mean? Because the distribution is skewed. There might be a few homes with very small values, and a few with very high values, but where do you put the "middle"?

Example: Broken Arrow, OK (I know someone there). The least expensive houses on the market, as I find from Realtor.com, are in the $25,000-$50,000 range. The most expensive, in the range between $1.2 million and $1.4 million. Do you think it likely that home prices are evenly distributed between these limits, producing a "middle" value of about $700,000? Not likely! In this market, this moment, 684 homes are for sale. Houses # 341 and 342 on the sorted list the web site provides are both priced at $170,000. That is our median for this market (today). Quite a bit different from 700k, isn't it? Half the houses' owners are asking $170,000 or less, and the other half are asking more. If you can afford a $200,000 house, at the most, you have a lot to choose from. Wherever the larger values in a distribution are a big multiple of the smaller values, the median is usually the best measure of "average".

This is my simple attempt to explain a few statistical principles. Charles Wheelan does a superb job of explaining these and a goodly number of others in Naked Statistics: Stripping the Dread From the Data. In the middle of the book, for example, he dwells quite a bit on the Central Limit Theorem. This has to do with sampling.

Above, I took six measurements of my kitchen counter. I could have taken a lot more, perhaps spaced every inch, or even closer. Suppose I sent my wife into the kitchen with a yardstick and asked her to make six measurements, with the same yardstick, in locations of her choosing. Then perhaps we could grab some of our neighbors and have them repeat the experiment. Now I will have several sets of numbers, and each set will have its own average. Do you think any of the averages will be close to, say 22, or 27? Not unless there are some BIG wiggles in the counter's shape, that I avoided with my measurements. If I could get a lot of my neighbors to make sets of measurements, the Central Limit Theorem (CLT) predicts that they will be distributed a lot like section (a) of the illustration above, clustering about some average value that is close to the "real" mean for all possible measurements of my counter.

As the author goes on to show, with marvelous examples, this is the source of the power of polling. Not only can a poll yield very useful results about all 180 million American adults by polling 1,000 or 2,000 people (properly chosen!), marketers (who pay the most for such data) can predict some of our preferences based on what we have already bought or even searched for (Google sells its search results, don'tcha know). My wife and I have "loyalty cards" from a few local grocers and other stores. We get discounts on certain items for scanning the card when checking out. In a sense, the store is paying us for the right to keep track of our purchases. Something else we get during checkout is a series of spot-printed coupons (the more we buy the more coupons they print). Some coupons are for more of the things we often buy. Others are for similar items of competing brands (the brands' owners are in on this also). And there will usually be a few "wild card" coupons that show up over time, for things we might not usually buy. Why? Because other people whose purchasing habits are similar to ours buy those things, and the store is betting that we are more likely to try those items if we get a coupon to prod us, compared to giving the same coupon to random shoppers. They have also figured out that we are "on the edge of elderly", so some of the coupons are for things like Ensure (an energy drink for old folks) or Depends (adult diapers).

Think about it. A typical supermarket has tens of thousands of customers that visit regularly. If 25% have the store card, they can slice and dice that population a dozen or a hundred ways, to target their coupon campaign. And, since coupons cost almost nothing to print, they can throw in 30%-50% off-the-wall coupons so we don't realize how precisely we have been targeted!

If you get nothing else out of this book, read it carefully for the author's explanation of the CLT and his stories of how it is used (such as how Target "helped" a father learn that his teen daughter was pregnant). He reveals all sorts of tricks of the trade, such as the numerical way to handle binary differences such as male/female. I am a math junkie, so of course I love a book like this. But I think the math-averse will also find it very entertaining and informative.

Monday, July 15, 2013

Bigger goes along with better

kw: book reviews, nonfiction, history, economics, sociology

I remember my father telling me, when I was a pre-teen or early teen, his theory of "why kids are getting bigger" and in particular, why new Olympic records were still being set, after generations of competition. "Better food and better medicine. When kids don't get sick as much, they can get bigger and stronger and faster and smarter."

Another memory. Valley Forge, maybe five years later. A park guide said, "General Washington was like Goliath to the British. The average 'redcoat' was about 5-foot-2, and Washington was 6-4. He rode the biggest horse he could find, and it scared the crap out of the British." He exaggerated a little—Washington was 6'-2" (187 cm), and as the chart below shows, British recruits just prior to 1800 were about 5'-6" (167 cm)—but the point is accurate. He was the tallest man most of them had ever seen. Of course, most of his own men were similar in stature to the British, and he scared the crap out of them also.

This chart illustrates a "Waaler Surface". It is from page 60 of The Changing Body: Health, Nutrition, and Human Development in the Western World Since 1700 by Drs. Roderick Floud, Robert W. Fogel, Bernard Harris, and Sok Chul Hong. The chart is rather complex, so take a moment to peruse it, and then I'll explain.
For Americans, refer to these tables:

Height
160 cm = 63.0" ≈ 5'-3"
170 cm = 66.9" ≈ 5'-7"
180 cm = 70.9" = 5'-11"
190 cm = 74.8" = 6'-3"
(to convert cm to inches, divide by 2.54)

Weight
40 kg = 88 lbs
60 kg = 132
80 kg = 176 lbs
100 kg = 220 lbs
(to convert kg to lbs, multiply by 2.2046)

The Iso-BMI curves—the dashed lines—set a foundation. The heavy curve shows minimum risk of dying in the next ten years. To the left is underweight, and to the right is overweight. Now, the iso-mortality-risk curves—the thin solid lines—show relative risk of dying within ten years, gleaned from a study of middle and late middle aged Norwegian men, and correlated with the sketchier records of the past 300 years throughout Europe. The curve labeled 1.0 passes through the minimum-risk curve at 1.675m (66") height. At that point, the healthiest weight is 72 kg (168 lbs). But a person taller than 1.84m (72.4"), weighing about 80 kg (176 lbs) has a 30% lower mortality risk!

Hmm. I am 6'-0" and 207 lbs (1.83m and 94 kg). That puts my relative mortality risk at 0.9. That is 15-20% worse than the optimum for my height. More properly, among a group of men just my size, 20% more of them are likely to die in the next ten years, compared to those at optimum weight/BMI. That is according to this Waaler surface.

But the little circled × and + signs mark the important historical message of this chart. Throughout most of the past 300 years, the French were shorter than the British, and less healthy. However, the trend through time has brought the two populations to near parity, with average heights of about 1.77m (just under 5'-10"), and though the British tend to be heavier, their mortality risk is the same (they are on the same iso-mortality-risk curve).

A little further along in the book, Table 2.5 on page 67 lists the average height for men in six European countries, as it has changed since 1750. I converted the data into this chart:

A quarter-century marked 18-IV, for example, refers to the years 1875-1899. That is the range of dates during which men in a particular cohort reached physical maturity.

I am not sure what happened in Denmark since 1975, but for three of the other five countries the height scoops up toward the 1.8m range, then slows down approaching the year 2000. The last data point is missing for Hungary, and we see that France is apparently a little behind England and the Scandinavian countries, perhaps lagging by 25 years (It is tempting to speculate that the different lines will converge on some optimal average height, but only time will tell).

This is something the authors remark upon, that in the richer western countries the trend of increasing height seems to be slowing down. Perhaps there is a genetically-determined maximum height for each family, and as barriers are removed, through better nutrition, better medical care and less need for very heavy labor, that maximum is being reached by a larger and larger number in each generation.

The book is a textbook for a socio-economics course. Thus it is very dense with figures and tables and discussions of the sources and assumptions and calculations needed to reach the authors' conclusions. In a way, the plethora of illustrative material shortens the reading substantially. Reading a text-only book of this density, of 374 pages, would have taken somewhat longer than the 12 days it took me! I can swallow up a page-size table quicker than I can read a page of scientific text. I would say that the first 80% of the book is devoted to "confidence building": presentation of the facts and sources and so forth, so that the later conclusions will be more readily accepted. If the authors choose to write a more popularized version, I'd expect something quite a bit more brief.

But this book is quite valuable as it is, for those willing to slog through it. All that data on size and weight and purported health are fine, but what has caused it? And are we really healthier? Not all agree that we are. Much of the work discussed, work done by the authors and many colleagues, and by others, including those who may draw different conclusions, is intended to tease out mortality and morbidity as they have progressed through time. Prior to about 1930, when the first effective antibiotics were developed, we see that average size and health were already increasing. The "Third World" is presently living an 18th-Century life, nutritionally and medically, and it shows; average adult male heights for most groups are near 1.65m, plus or minus 5 cm or so.

However, the discussions do not dwell only on averages. One large section deals with distributions of caloric intake, compared to the basal metabolic requirement plus "maintenance", which is necessary to stave off long-term starvation. Throughout the 1700s and into the early 1800s, 15-30% of the poorest didn't have enough to eat, and were starving. Even those just a little better off had little energy to work. Did you ever wonder why the beggars in poor countries (and poor areas of richer ones) do little more than sit around begging? I am not talking about panhandlers on American city streets, but the truly destitute. It is because they eat so little they have no energy left for any kind of work.

For example, if a man of a certain size could sleep all day, his basal metabolism might require 1,600 calories (AKA kcal). Yet he needs to awake from time to time to, at least, eat and eliminate. Also, his body uses some energy replacing worn out tissues. These maintenance activities bring his caloric requirement to 2,030 (an added 27%, or 430 kcal). Just walking around takes about 3 times as much energy as sitting still (2x for the walking and 1x for basal). This factor of 3 is a Physical Activity Ratio, or PAR; a 24-hour walk would require 4,800 kcal (the exercise plus the basic) plus 430 (maintenance) for a total of 5,230. Of course, a full-time panhandler won't walk 24 hours, but may walk a few hours a day. If we assume 4, the total caloric requirement is 2,030 plus about 530, or 2,560 kcal. If this person does anything else except lie around quietly, more calories are needed.

At the high end of the scale, assuming a farmer can eat all he needs to, the PAL for most farm labor is about 5. Farmers work long days; 10 hours or more is quite common, 7 days per week. Then many around-the home activities have a PAL around 2. A farmer's energy budget may resemble this:
  • sleep 8 hr: 530
  • labor 10 hr: 3,330
  • family interaction 3 hr: 400
  • read, talk, etc. 3 hr: 300
  • body maintenance: 430
That all adds up to 4,990 kcal that he needs to eat. The next time you are at a café that farmers or farm hands frequent, don't be surprised by their enormous breakfasts! However, for those without such ready access to abundant food, such as the lower classes in 18th Century Europe, undernourishment was the norm. Dad was underfed and couldn't work hard enough to earn much, Mom was underfed and so had undersized babies; she had a scant milk supply and after being weaned the youngsters frequently went hungry. They grew up short and scrawny. Then, being undernourished made them all less able to fight off disease, which also takes energy. Their lives were short and hard, and other charts in the book show how size correlates with social class. Simply put, the aristocrats were taller because they could afford sufficient food, and being better nourished means they were less often ill.

The epigenetic effects of such a history can't be shrugged off in a single generation. The 300-year chart above shows how gradual the change was. Increasing food supplies came first, and it took a few generations for size changes to follow. 300 years represents 12-15 generations.

Now we can consider life expectancy. At this point, data for women was also available. This chart, from page 148, shows the increase just through the 20th Century. The bars represent average remaining life expectancy. Thus, for 65-year-olds, only a dozen or so years remain, meaning that on average they can expect to live to age 80 (men) or 85 (women; for the 2000-2002 figures).

From the bottom, note that at age 0 compared to age 1, there is quite a difference in 1900, but very little in 2000, though for both there are great increases from 1900 to 2000. Infant mortality was so great in 1900, that a 1-year-old, who had already survived a few "childhood diseases", could expect on average 5-6 years more life than a newborn.

As an aside, two years ago we visited a cemetery in Missouri, where some of my ancestors are buried. I was sobered by the large number of tiny gravestones for infants and children under 5, who had died between about 1830 and 1930. The gravestones for those who survived to ages in the range 70 and up were very few. I think my great-grandfather was the only 90-year-old buried there. When he was 65, in 1935, his average life expectation was to live to about 77. He beat that, but most didn't. My father, who was 88 when we went there, had given me a list of men to look for, in the nearby town, men he had known as a child. I reported later that I found them all—in the cemetery. Dad is the last survivor of his generation from that town.

Today, it is considered unusually tragic to have a child die. Prior to 1900 it was tragically common. So perhaps the age-15 figures tell the best story. At age 15, in 1900 a young man might expect to live another 48 years, to 63. By 1950, it was 53 or 54, to almost 70, but by 2000, a 15-year-old has an average remaining expected lifetime of 62 years, to age 77. For young women, add 3-4 years to all these figures.

Now we come to the last figure I wish to show, Figure 6.8 from page 341:

The squares on the dotted lines are for Union soldiers who'd survived the American Civil War, measured and weighed at ages 40-59, and their dates of death from military pension records. The circles on solid lines are from modern non-Hispanic Americans, measured and weighed at ages 40-59 in the 1980s (61,000 of them). In both data sets, the relative mortality risk is for the succeeding decade. The simple message is clear. Being shorter is riskier, and being too skinny is also riskier. The "sweet spot" for the Civil War veterans is about 1.9m tall and BMI near 24 (87 kg: 6'-3" and 192 lbs). For the recent cohort, it is about the same height, but optimum BMI is nearer 26 (94 kg or 207 lbs).

Both BMI charts show that risk rises gradually upwards of 26, and picks up faster after 30, the criterion for "obese". However, a Waaler chart such as the one above is needed to ferret out whether being short and heavy is better or worse than being tall and heavy. Look at the Waaler surface above: at any BMI between 19 and 31 or 32, being over 1.8m tall is better than being 1.6m tall at any weight!

But this introduces a new and sobering realization. In the west we have gone from being undernourished to overnourished. In America 30-40% are now obese, and the European nations are following closely behind. Mortality and morbidity (chronic debilitation from illness) are increasing again. The number of morbidly obese (BMI more than 40) is the most rapidly increasing segment of American society. I'd be morbidly obese at 295 lbs, or 134 kg. These days I know lots of folks over 300 lbs, and a few in the 400-plus category.

As the authors of Changing Body show, there has been an amazing run of three centuries, with improving nutrition, public health, medical care, that have led to taller and heavier and stronger and even smarter people throughout the west. But we are going over the top. Now we spend about as much for weight loss programs and products as we do for food. How long will it be, until, if nothing else, those who most easily grow exceedingly obese are culled from the gene pool?

Wednesday, July 03, 2013

Tree of beauty, tree of healing

kw: book reviews, nonfiction, trees, natural history

The Ginkgo is not only one of the most beautiful trees, it is distinctive. Once you have seen one, you will likely recognize one at a distance. Here in the U.S., few have seen really big ginkgo trees. The oldest one in the country is probably one in Georgia that is said to be 222 years old, planted to commemorate a visit by George Washington in 1791. The largest is probably one in Connecticut, planted in 1860, and 33m (110ft) tall. Several American ginkgoes have a girth of about 5m (16ft), or a diameter in the range of 1.6m (5ft). Compare those with one in Yongmunsa, Korea, that may be as old as 1,100 years, with a height of 62m (203ft), a girth of 14m (46ft) and thus a diameter near 4.5m (15ft). One tree in China has a diameter over 5.8m (19ft). However, most of the "street tree" ginkgoes in the West are less than 100 years old, perhaps 20m (65ft) tall and half a meter (20in) in diameter. I see a row of ginkgo trees of about that size when I visit the Hagley Museum near Wilmington, Delaware.

The Chinese, Japanese and Koreans seem to have found a medicinal use for almost everything. While the American FDA has about 800 approved drug substances (and more than 100,000 formulations), an apothecary shop in Asia is likely to have 10,000 or more ingredients and an uncounted number of ways of compounding mixtures. Extracts and other preparations of ginkgo leaves, seeds and bark are used to treat many bodily ailments. In the West, we seem to limit ourselves to a leaf extract that is mildly effective for memory problems and perhaps for some forms of dementia. My mother used it once she began to suffer dementia (my father remembered for her, when to take it).

A very comprehensive and readable account of ginkgo natural history is found in Ginkgo by Peter Crane, former Director of the Royal Botanical Gardens of Kew, England. I knew the genus Ginkgo is ancient but I didn't realize just how ancient. Though the species Ginkgo biloba as described by Linné is a few millions or tens of millions of years old, numerous fossil ginkgo species date back as far as 300 million years or more. The ginkgo line survived the great Permian mega-extinction event. However, they almost didn't survive the Pleistocene!

A few million years ago, a small number of different ginkgo species had worldwide distribution. The ice ages that began about 5.5 million years ago, and ramped up in earnest some 2.5 million, nearly did them in. By about 5,000 years ago, as well as botanists and paleontologists can determine, only a few small groves of ginkgo trees survived in remote parts of China. However, humanity, the agent of the current mega-extinction, in this case saved the ginkgo, partly because the Chinese liked the edible nuts, and perhaps also because of their beauty (I suspect if poison ivy were in danger of extinction, most people would gladly let it go!).

In seven sections and 37 compact chapters, Dr. Crane introduces us to the natural history of the tree, its ebb and flow through time, and how ginkgoes and humans have interacted for at least a couple of thousand years. Most folks are impressed by things like record size, but these are neither the largest of trees, nor the longest-lived. They have other unique features.

As I mentioned above, the tree's outline is distinctive, though certain old pine trees can look very similar. But the leaf! There is nothing like it. The species was named for its bilobed leaf. Not all ginkgo leaves have the central cleft, and on older trees, few do. But with or without it, the fan-shaped leaf is unique. It is said that the Chinese love the ginkgo because its leaves resemble their paper fans. I suspect the leaf inspired the paper fan!

Also, in contrast to all the common "hardwood" trees, the ginkgo is not a flowering plant. It is more closely related to "seed ferns". The male tree bears small pollen cones, and the female tree bears somewhat larger ovule cones. They are wind pollinated. Few trees are dioecious (that is, having separation of sexes), but some familiar trees are, such as red maple and yew. But ginkgo pollen is absolutely unique. The sperm swims to the ovum, once released from the pollen grain upon contact with the ovule. Finally, ginkgo leaves turn yellow, seemingly all on the same day, in early fall. Where they have been naturalized into a forest, an aerial view of the forest in early October (in the North, anyway!) will have a polka-dot appearance. By the time their yellow leaves fall, the reds of maples and other such trees will dominate.

Because of quirks of human nature, ginkgoes have been preserved. Without, they might have gone extinct hundreds of years ago. Can we muster up a sufficiently widespread love of other trees that are now endangered, to preserve them as effectively?