Friday, July 13, 2012

Maybe worse than impossible

kw: programming, computer science

A couple of decades ago I was working on a program to simulate the way oil fills up a "reservoir", really an impermeable surface with some shape that could trap oil as it trickled upward. A colleague and I tried out scheme after scheme, mainly based on how hard each one was to write into computer code. Of course the easiest one to write elicited my colleagues comment, "Boy, that is pretty bogus." He meant that whether it worked or not, it was a very inefficient way to proceed. We eventually found a pretty efficient method. Other projects throughout the years have been, sometimes a search for efficiency, and sometimes a frantic scramble to avoid too much bogosity.

Definition: Bogosity in computer programming is the inverse of efficiency. Where "bogus" usually means "fake", to a programmer it means a bad way to do something, one that will make the computer take too long. But we need to see just how bad "bad" can be. The standard programmer's example is sorting, so I'll use it.

Sorting data has received huge amounts of attention from thousands of bright people because it is needed so frequently, and is very slow unless clever methods are employed. Sorting bogosity is easy. The (almost) easiest method is one you can do by hand, and it matches the way people typically operate when sorting a small number of things, such as lining up a dozen rocks from smallest to largest. This easy method is called the Bubble Sort:
  • Line up the rocks and look through them for the largest. Put it at the end of the line.
  • Look for the largest among those that remain. Put it next to the other one.
  • Repeat.
We can do this rather quickly because we can compare several items at once by eye. But in a computer, it must be done by comparing them pairwise. So a computer would have to use a few extra steps:
  • Load numbers representing the "size" of the rocks (such as weight. That means weigh them first).
  • Compare the first two numbers. If the first is bigger than the second, swap them.
  • Compare the next two numbers, and so forth, swapping as needed, until you get to the end of the list.
  • At that point, the last storage location contains the largest number.
  • Start over, but stop one short of the end. Now you have two numbers in order.
  • Repeat until you have made 11 passes through an ever-shortening list of numbers.
For 12 rocks, the program made 11 comparisons, then 10, and so forth, for a total of 66. That's OK for a dozen items, but what about sorting a deck of 52 cards? Once you determine which order the suits go in (whether to sort by suits or by numbers within suits), you have to make 1,326 comparisons. That is a lot for a human, though it is pretty quick for a computer. But computer programs need to do a good job even if sorting tens of thousands of items. For 10,000 items, the bubble sort takes almost 50 million comparisons.

A different kind of sort works better. I'll describe one variation of the Shell Sort. It takes fewer comparisons, but uses some extra space. First, for 12 rocks:
  • Line up the rocks as before.
  • Compare the first two. In a nearby space put the smaller one on the left and the larger one on the right.
  • Compare the next two from the original line. Put them in order nearby.
  • Continue until you have 6 sorted pairs.
  • Now take the leftmost rock from the first pair and compare it to the leftmost rock from the second pair. Put the smaller rock at the left end of a new line. It is the smallest rock of the four.
  • From whichever pair that smaller rock came from, pick up the other rock. Compare it with the one you are still holding.
  • Put the smaller one next to the first, smallest rock.
  • Pick up the fourth rock.
  • Compare it with the one you are still holding. Put these two rocks in order next to the first two. Now you have four sorted rocks.
  • Continue with the next pair of pairs.
  • Continue with the third pair of pairs. Now you have three lines of four sorted rocks.
  • These were merge operations. Perform a merge operation on the first two lines of four. This results in a line of eight sorted rocks, and the other line of four is still there.
  • Merge the line of 8 and the line of 4.
If you were counting comparisons, there were 33. You went through the rocks four times instead of 11. Half the effort, at the cost of a little more complexity of planning. To sort 52 cards this way, you'd wind up making 253 comparisons. That is about 19% of the original effort to sort the deck. The general formula for comparisons is N Log/2(N), so sorting 10,000 items takes about 133,000 comparisons, or 1/376th the effort!

There are clever variations of the shell sort that reduce overhead a little, but that is basically the most efficient sort method. But we were talking about bogosity here. The bubble sort is quite "bogus" compared to the shell sort, which is what my friend meant. Is more bogosity possible?

Certainly. For example, if you play Klondike solitaire with the cards, you are only going to win about one time in four if you don't cheat, so each game you play will have some number of comparisons that is less than 1,326, unless you win, but the comparisons are accompanied by "overhead": stack moves and dealing and so forth that greatly lengthen the time to produce a sorted deck even if you win the first game. A typical "sort" might take four games of average length about 700 or nearly 3,000 total comparisons.

Let's declare that the bubble sort has a bogosity of 1. Then a series of games of Klondike
that leads to a win would have a bogosity of 2. Other solitaire games will also be 2's, though they vary a little in how frequently you win without cheating.

This is not nearly as bogus as it gets: I don't know what number to give it, maybe 100, but there is a "sort" that has maximum bogosity, so far as I know. We can call it the Bogo Sort:
  • Throw the cards across the room.
  • Pick them up and flip the face-down ones face up.
  • Look through the deck to see if they are in order.
  • If not, repeat. (Even if only one card is out of place! No cheating by moving it)
The average number of times you'd have to repeat this operation to achieve a sorted deck is a number with 68 digits (roughly 4E+67). If you could "throw-pick up-check" once per minute, it would take about 1E+62 years (That is a hundred trillion trillion trillion trillion trillion or so). Now that is real bogosity.

No comments: