kw: analytical projects, art generation, ai art, statistics, statistical distributions, lognormal, scale free
I began using art generating software in November 2022, when DALL-E2 became available. Since then, I've enjoyed having a series of art generating "engines" available, including numerous engines (called "models") in the aggregators Leonardo AI and OpenArt. As often as I can, I generate images for this blog; in some cases, I download images I find on the Internet. However, my primary artistic pastime is creating images of things and scenes I imagine.
Just in the past few days I was inspired by a heavy snowfall to find short poems about snow, and use them to create wintry images. This image was drawn by Nano Banana Pro, under the Leonardo AI umbrella with "None" as the style; that is, native NB Pro. The aspect ratio was set to 16:9. It displays the entire poem, something NB Pro can do better than any other art engine I have found. The prompt was "Watercolor painting evoked by a poem:", followed by the text of the poem "The First Snow" by Charlotte Zolotow.
The image is particularly evocative in shifting to an exterior view as the window dissolves. I suspect there are a number of images that use this device in the training material for NB Pro.
When I made signed versions of this and several others that were generated in the same session, to be included in a folder for a "screen saver" slide show, I began thinking about the various numbers of different image types I've created in the past three-plus years. Last year I went through my (poorly organized) folder stack of "AutoArt" and reorganized it into 35 categories, each in its own folder. To date, there are 1,472 signed images in 35 folders containing between two and 405 images. My inner statistician began to stir…
The image below shows two analyses of the statistical distribution of the numbers of files in these folders.
Charts like these make it quite evident which statistical treatment is appropriate to a particular set of data. I'll explain what these charts mean and how they were created.
"Scale Free" is a type of power law distribution related to the Pareto distribution. It is easy to analyze, which makes it popular. To analyze a series of numbers graphically in Microsoft Excel:
- Enter the numbers in column B, starting in cell B2.
- Put an appropriate header in cell B1
- Highlight these data (B1:B36 in this case)
- Sort from largest to smallest, using the Sort & Filter section under Editing in the Ribbon.
- Enter 1 in cell A2 and 2 in A3.
- Put a header in cell A1; I usually put "N".
- Highlight cells A2 and A3.
- Double-click the fill handle at the lower right of A3. This will fill the rest of the column with numbers in order, as far as the data goes in column B. In this case, we get numbers from 1 to 35.
- Highlight these two columns to the end of data. In this case, from A1 to B36.
- In the Ribbon, use Insert and in the Charts section, select the icon showing scattered dots with axes; this is X-Y Chart.
- The title of the chart is whatever the header text is in B1. Edit as you wish.
- Double-click one of the axes to open the Format dialog.
- Click Logarithmic Scale near the bottom of the menu.
- Click the other axis and also click Logarithmic Scale. This is now a log-log chart.
The result will be similar to the upper chart. Now for the lognormal analysis, beginning with these two columns of numbers:
- Insert a new column between A and B; this is the new column B.
- In cell B1 enter a header such as "Prob.". You are going to create a probability axis.
- In cell B2 enter this formula (where the largest number in column A is 35):
=NORM.S.INV((A2-0.5)/35)
- Double-click the fill handle at the lower right of A2 to fill the column with the formula.
- Highlight the data in B and C (B1 to C36 in this case).
- Use Insert as before to create an X-Y Chart.
- Edit the chart title.
- Note that the vertical axis is now centered above the zero.
- Assuming the Format dialog is still open, click the horizontal axis.
- In the middle of the menu in the section "Vertical Axis Crosses", click the bubble at "Axis Value".
- Enter "-3".
- Click the vertical axis and click Logarithmic Scale. This is now a log-probability chart.
- If you want the markers to be a different color, click one of them. The Format Data Series menu appears at the right.
- Select the icon of a paint bucket pouring paint.
- Click the Marker tab
- For both Fill and Border, select the color you want.
This will be similar to the lower chart. For the data I used, the chart shows the points scattered approximately along a straight line. By contrast, in the upper chart there is a definite downward bend. In a log-log chart such a shape is diagnostic that the distribution is not scale free, but is more likely to be lognormal, or even normal (Gaussian). In this case, the second chart shows that lognormal is a good model of the data distribution.
This is an illustration of the Theory of Breakage, formally described by A.N. Kolmogoroff in 1941. When an area is divided (US state or county areas are good examples), the distribution is lognormal. When a sheet of glass is broken, the weights of the pieces also have a lognormal distribution (I've done this experiment). Some recent publications claim that a theory of breakage produces a power law distribution, but this is false. Certain phenomena in nature tend to be normally distributed. The classic example is the height of adult men, or of women (but not both) in a population, such as the residents of a particular town or county. However, most phenomena produce groups of measurements that are lognormally distributed, in which the logarithm of the quantity being measured is distributed as a normal, or Gaussian, curve.
I could go further into this, but this is enough for the purpose of this post.


No comments:
Post a Comment