Friday, July 30, 2010

More galloping

kw: observations, analysis, mathematics

I posted just two days ago on error propagation in the Logistic Equation
Xn+1 = r Xn (1 - Xn)
I used an r of 3.625, which is well in the chaotic region, and has an exact representation in floating-point variables, and a starting X of 0.5. The equation, for any value of r less than 4.0, produces a series of numbers between 0 and 1. The factor X(1-X) makes this well-suited to minimize error propagation, because the error in X will be balanced by an equal and opposite error in (1-X).

I crafted a rational-number method for use in Excel that has double the precision of raw Excel variables. I used this to prepare a set of "standard" results from the Logistic recursion. I compared those results with ordinary calculations, and with reduced-precision calculations at four ROUND settings, of 15, 12, 9 and 6 digits. Though Excel displays up to 15 digits of a number, which implies about 50 bits of precision, internally, it does calculations to a higher precision, providing "guard digits" and making for fewer infelicities like 1/(1/3) producing 3.00000000000001.

As I expected, the column of raw Excel calculations subtracted from the "standard" showed that the internal representation is actually about 19 digits, or 64 bits of precision. This squares with a report or two I have read that claim Excel uses IEEE 80-bit floating point values, though I don't find any Microsoft publications that confirm this.

This chart shows the accumulating errors of each data set.

I truncated the lines for 6, 9, and 12 digit rounding at the point just beyond where the errors "saturate", or become equivalent in size to the data. Two issues are instantly visible. Firstly, 15-digit rounding has nearly no effect on total rounding error accumulation. Secondly, the 9-digit rounded sequence has fortuitously produced a highly accurate value at about step 75, which delayed the onset of ruinous error accumulation by about 20 steps.

I have observed in the past, for many recursive calculations, that rounding errors tend to double in absolute magnitude with each step. Only a well-crafted algorithm plays errors against one another to slow down this phenomenon. Such is the case with the Logistic recursion. The next chart shows the absolute values of the errors for each sequence, in semi-log coordinates.

Here we can see more clearly what is happening. Each data set has a characteristic trajectory. At step 75, the 9-digit-rounded sequence has been "reset" to a level characteristic of about twenty steps earlier. And though the Raw and 15-digit-rounded sequences start a little differently, they follow nearly identical paths after the tenth iteration.

I did a regression on the central 80% of each of these sequences (except for 9-digits, which I truncated at step 75), which confirmed that the errors take about ten steps to increase by a factor of ten, which is a doubling rate of about 3.3 steps per doubling. However, the details show an interesting phenomenon, shown in the third chart:

These are about the first thirty errors (steps 3-32; steps 1 and 2 were error-free) for the 9-digit and 12-digit sequences. The errors grow rapidly for four or five steps, then drop by an order of magnitude, then repeat. I looked at the numbers. In all the sequences, whenever the resulting X is close to 0.5, the next step will result in a much more accurate value. The steep slope of these mini-sawteeth is one of doubling at every step. This periodic resetting of the error propagation accounts for the ability of these calculations to proceed for so long (more than 100 steps for 15-digit-rounding and for raw without rounding) with so little error.

I expect that my rational algorithm has a precision of 30-40 digits, depending on the way Excel is handling the guard digits. I'll use these data to predict how far it can take the sequence without being swamped by error. The regressions I performed can be extrapolated to predict the point at which |error| equals 1. For the 6, 9, 12, and 15-digit algorithms, they are at step 63, 90, 116, and 156. These indicate that each increase by one digit in the accuracy of the calculations gains nine or ten steps of meaningful results. That means that a 30-digit-accurate calculation sequence can go about 300 steps in the Logistic recursion, at which point the values are no longer meaningful.

This points up the value of doing analytical math before starting any calculation that relies on a recursion, such as a numerical integration. It also underlies the mantra I heard from teachers in my advanced math courses: use the highest order method, with the longest steps, that you can. If, as I have frequently found, many algorithms experience a doubling in error with each step, you really have to plan for a method that will recur no more than 50 steps for an initial precision of 50 bits (15 digits). The Mandelbrot set is notorious for such ruinous error propagation. Many of the pretty pictures you see are highly magnified noise that has converged on an aesthetically pleasant configuration. More to the point, the wisest programmers have used better algorithms and even "bignum" calculations (my rational method is a simple bignum algorithm).

Some nitty gritty for those who are interested. My rational algorithm worked as follows. I used the ratio of two Excel variables to represent r, X and (1-X). The numerator and denominator (n and d) for r were 1.9258125 and 0.53125. The n value was chosen to be close to the square root of 3.625, but for both n and d to be exactly representable by Excel floating point values. The starting values for X were 1 and 2, representing 0.5.

At every step (1-X) was calculated by subtracting the n for X from its d and using the same d. All three n's and all three d's were multiplied to get the n and d for the next X. Though X stays between 0 and 1, its n and d can grow very large or very small, in synchrony. Whenever the exponents for the n and d values approached 100 or -100, I normalized the current X (multiplying its n and d by an appropriate value) and continued. That means that running this method needed manual intervention about every eight to ten steps. However, doing this manually gave me a number of good ideas about programming such a method in C or Perl should I want to investigate further.