kw: generated art, ai experiments, surveys, simulated intelligence, prompts, prompt adherence
Introduction
This is quite long so I include headings.
I remembered a science fiction story that I read decades ago, about a planet of creatures that looked a lot like humans, but had wings and flew about. Pondering a way to illustrate the central idea of the story, after some experimentation, I came up with this prompt:
On a planet with low gravity and dense air, many winged men and winged women are flying into and out of very tall buildings with doorways and landing platforms at every level
I was primarily interested in the great variety of imaging Models offered by Leonardo AI, and I had thousands of credits available. I surveyed nearly all the possibilities in Classic State, which yielded 184 images, all based on a particular Seed; more on that anon. I also used the prompt to produce much smaller suites of images with Dall-E3, DesignStudio, Gemini, and ImageFX. To jump ahead to a useful conclusion, I found that Dall-E3 produced images closest to the meaning of the prompt; in the lingo of the field, it has the greatest Prompt Adherence. Here are two examples, the best of the 24 images I gathered from Dall-E3:
I am showing two images because the first has the best people and the second has better buildings. When prompting an art generating program, it takes persistence and cleverness to get good adherence to a complex prompt. BTW, did you notice the figure with wings on backward?
Models and Styles
Leonardo AI (henceforth Leo) has thirteen Models or Presets and each has a number of Styles, as many as 23. This is a list of the 13, with the number of Styles each includes, and the cost in credits per four images, the usual number. The free program assigns 150 credits daily. It is easy to use them up pretty fast! The basic monthly subscription assigns 8,500 credits per month for $12. These pile up fast unless you are very active. I wish there were an in-between subscription level such as 3,000 credits for $5.The total number of Styles is 61, and the number of Model+Style combinations is 194. I did not quite use them all. Several Styles are named Monochrome or include B&W in their name and those don't interest me.
Seeds
Leo also has the option of using a fixed Seed. Leo's help text says that this Seed is used to generate the "noise" that the reverse-diffusion process starts with to produce an image. That is not all there is to it, because the Seed value also influences the Transformer (the routine that figures out object identity and placement, which defines the goal toward which the diffusion algorithm operates). Seeds in Leo (also DreamStudio) have six digits.
When I set a fixed Seed, I tend to pick Seed values that are a multiple of 999,999/7 or 999,999/13, all of which have six digits except 999,999/13, which is 76923. After a period of noodling around, I found the region around 428571 (3*999,999/7) most interesting and settled on 428575. All the images shown here were produced using this Seed.
Selection 1: Leonardo Lightning Style Gallery
No matter how I may group the images, it would be tedious for the reader to wade through a discourse on 184 images. Thus I picked certain themes. I chose primarily certain Styles across all the Models, but I will start by showing the gamut of Styles for the Model called Leonardo Lightning (or LLightning), which runs faster than the others and costs the least—two credits per Medium-size image (1280x720), while for most Models the cost is 3 or 4, and it goes up to 12, and even higher where larger images are available in certain Models.
Forthwith, the gamut of LLightning images, screen-captured three across from File Explorer, so the file names can be seen:
The last image above is from the next set, from the Model Phoenix 0.9.
LLightning is unique among Leo's Models; 16 of the 21 Styles shown here produced things with wings, but only 7 are winged people. The third Style, Cinematic, has what appear to be human-sized bats, but they may be humans with bat wings. It is hard to tell which, even on the full size image. Two others show beetle-like flyers, two have birds, and the flyers in the other 4 images are unidentifiable. Finally, the images from None and Unprocessed are apparently identical, which I find logical: both claim to be doing nothing "extra" to the Model. This is also seen for the Model Cinematic Kino, the only other Model that offers both None and Unprocessed.
Compared to the other Models, this mix in LLightning is interesting. Four Models (the two Phoenix versions and the two Flux versions) adhere to the "winged" part of the prompt 100%, although only the two Phoenix versions have winged persons in all images, while the two Flux versions have more birds and fewer people. On the other hand, two Models (Graphic Design and Stock Photography) never produced wings on anything. Most of the Models yielded low percentages of winged persons. Anime is unique in a different way. Half of its Styles produced images with winged humans and two Styles have airborne humans without wings (one hopes they are floating, not falling). Only two Anime Styles had no wings at all.
Selection 2: Model Galleries for nine Styles
Style 1 = None (10 uses)
Now I will focus on particular Styles as developed in different Models. The Styles to be presented are those having larger numbers of Models that use them, in descending order by usage numbers. The first is None, meaning no Style was applied. This (non-)Style is available for the largest number of Models (10), with a modification to be mentioned below. I used a search to isolate each set of images, and a quirk of the search function is that the images are presented in reverse order.
In the next-to-last row of images, the first two appear identical. Upon very close inspection I find a few tiny differences. In the row above that, the second and third image are, so far as I can tell, identical. This shows that behind the scenes Portrait Perfect and Cinematic Kino use the same engine, as do Graphic Design and Illustrative Albedo. So in this case the 10 Models produce 8 unique images (discounting a few nearly invisible differences in one case). These images show what each Model produces when it is not constrained by a Style.
Style 2 = Dynamic (8 uses)
Dynamic is the default Style for the 8 Models that use it.
For the Dynamic Style, the image I like best is for Phoenix 0.9. It is the best match to the image I had in mind after reading the story, so long ago. Comparing all these images with the prior set, I find that for Flux Schnell the Dynamic Style produces the same image as None. Similarly for Flux Dev, Phoenix 1.0 and Phoenix 0.9. For the other Models, these Styles produce significantly different results. Three of the Models—Illustrative Albedo, Cinematic Kino and Lifelike Vision—have what I call "winged structures", although one of them (for Cinematic Kino) looks like an immense beetle.
At this point note that "doors at all levels with landing platforms" is seldom found, and that is primarily in Phoenix 0.9.
Style 3 = Portrait (8 uses)
As we'll see later for the Fashion Style, Portrait often emphasizes a central figure, although in the case of Illustrative Albedo, that figure is a flying structure, looking like a giant crab with 6 wings. The image from Lifelike Vision has a figure with wings that are more like hang glider wings, rather than bird wings. But, wings they are.
Style 4 = Stock Photo (7 uses)
Stock Photo is the only Style used by the Stock Photo Model. Its image is almost identical to the one produced by Cinematic Kino, but not entirely (one must look hard to find the differences). The other 5 Models all yielded winged things with this Style, but only the image for Phoenix 0.9 has winged people.
Style 5 = Ray Traced (7 uses)
Now we can start to see that certain Models, such as the two Phoenix Models and the two Flux Models, have the primary influence. In other cases, the Style seems to be "stronger" than the Model. Other than having a brighter and more colorful appearance, Ray Traced is similar to Stock Photo. For this Style, only the Phoenix Models produced flying people.
Style 6 = Illustration (7 uses plus "Anime Illustration")
It is more clear with these, compared to the prior sets, that the Style is paramount for LLightning, Illustrative Albedo, and Lifelike Vision. Here, Anime Illustration Style in the Anime Model joined the Phoenix Models in producing flying people.
Style 7 = Fashion (7 uses)
Here is a Style that produces winged people, almost 100%! Lifelike Vision's person has a winglike, flowing robe. Perhaps the presence of about 8 wings on the person in the Cinematic Kino image makes up for that… As one might expect from the name, a "fashion model" is front and center.
Style 8 = Creative (uses = 7)
Following trends we've seen above, some of these differ from other Styles for a particular Model, while a couple of them are more similar. Only Flux Dev and the Phoenix Models produced flying people. The flying things in the LLightning image seem to be huge insects though one of them has feathered bird wings. I don't know why a scattering of these images feature hot air balloons.
Style 9 = 3D Render (7 uses, with two added sizes)
As mentioned earlier, most of these images were generated for the Medium 16:9 size, which is 1280x720 for most Models, but is 1184x672 for the Phoenix Models and the Lux Models. I did a side experiment to show the effect of changing image size with Phoenix 0.9. You can see in the file names for "LPhoenix09" the numbers 01a, 01b, and 01c. The image sizes are 1184x672, 1376x768, and 1472x832. When I want to use any Phoenix image for a 16:9 wallpaper, these sizes are a little off. Leonardo can upscale an image so it is larger than 1920x1080 (full HD), but then trimming is needed to get an exact ratio. Changing the size causes changes in the overall image, though the three images are similar.
For this Style, the Phoenix Models produced flying people, but the others differed: LLightning and Flux Dev made flying beetles, Flux Schnell made birds, while Illustrative Albedo and Lifelike Vision produced lots of balloons but nothing with wings.
Wrap-Up
The Leonardo AI Models definitely have minds of their own. Only images produced by Phoenix 0.9 had both elements, flying people and towering buildings with landing pads at upper levels. Few other Models had the buildings as I had asked. Generally, the images are somewhat inspired by the prompt, sometimes rather distantly. It would take a lot more investigation, letting the Seed be randomly produced, to see whether any particular Model+Style is capable of better compliance.
I use art generating software as a "commissioned artist". A few of the Models in Leonardo, the more costly ones, are reasonably compliant. Programs other than Leonardo vary in their Prompt Adherence, and Dall-E3 is probably the best. The rest of Leo's Models, in most of their Styles, are fun to experiment with, but are unlikely to yield results that well match anything but the simplest queries.
I did not test a rather new way of using Leonardo AI, Flow State. It has a different way of doing things. Neither did I turn on "AI Enhancement" for my prompt in those Models that offer it. What we have here is complex enough already.
The folders of images I produced are a useful Gazetteer of possibilities that I can use in the future to select an image generation routine.