Thursday, January 30, 2025

Training matters – a lot!

 kw: experiments, simulated intelligence, generative art, photo essays

The following two images share something important:



What is that something? It is this prompt:

The essence of ingenuity

I selected the first image because it is the best of the images that includes a person, and the best overall. The other is shown because it is the best of those that do not include human figures (the great majority). These, and other images to be seen below, share one other, possibly equally important attribute: Each is the cream of between ten and forty images generated by that prompt in a particular art generator, as influenced by various settings offered by the different programs.

I used the prompt above with five art generators: Dall-E3 (Bing's Image Creator), DreamStudio, Gemini (which runs Imagen 3), ImageFX (Imagen 3 running in Google Labs), and Leonardo AI. I frequently use one or more of these programs to produce wallpaper for my computer, which has HD screens with 1920x1080 pixels. Thus, these were all produced with a "16x9" setting. In another post I'll get into the actual pixel ratios the programs use and how I cope with that.

My current session resulted in 22 images. When I noticed I had substituted "innovation" for "ingenuity" while working with DreamStudio, I went back and worked with it some more, producing four more images.

My workspace is the Downloads folder. I run Google One and my main storage area for images is OneDrive. I do this to keep intermediate or early files, that may be discarded later, from being transferred to OneDrive or backed up. Once I have a final set of images to keep I move them to permanent folders in the Pictures area.

A montage of six of the images will help me describe other aspects of my methods:


We have here a screen capture from the Downloads folder. Each program has its method of naming a file. Dall-E3 (DE3) uses a long string of alphanumerics, which probably encode the binary seed and other information. Right after I download an image I rename the file according to a scheme I worked out that includes an abbreviation for the program name, the date, a sequence number, the prompt, and a prompt modifier. In the cases shown here the modifiers, if used, are suffixes. For three of them the modifier is "retro panavision", and you can see that the program responded with Steampunk cameras in various settings. For two images I added "cinematic landscape", which yielded a filmlike atmosphere. Then "digital art", which was the first suffix I tried, also produced a steampunk vibe, and included some persons. Note that each of these images is one of many; others weren't kept. Let's look at some more:


DreamStudio includes the seed number in the file name, so I retained it. It also includes the first 30 characters of the prompt. Note the word "innovation", an error on my part. Thus the lower three images are a kind of ringer. We'll come back to DS. Above them are the first three items from ImageFX. When you download one of these, the file name is just "image_fx_". If the file isn't renamed, the next one is "image_fx_(2)", and so forth. ImageFX doesn't keep your history. It is free and there is no paid version.

The image I kept from running it without a modifier shows five apparently ingenious people doing stuff. When I added "digital art", I got abstract art. I downloaded four of these, two shown here, and two in the next montage:


Three of the abstract images look like impressionistic evocations of a galaxy. ImageFX offers buttons to push to add modifiers, or you can type your own. "Cinematic landscape" yielded a city scape in a valley; I've noticed that many pictures made from a prompt that mentions "landscape" have the "X" arrangement we all learned in kindergarten. The upper arms of the X are the flanks of mountains and the lower arms outline a river valley. Finally, "retro panavision" produced, not a camera, but an ingenious tinkerer in his workshop. This is my most favorite from this project, which is why I set it at the top of the post. Another montage:


These are the Leonardo AI offerings. This program will of course pay attention to modifiers or suffixes as normal parts of the prompt. I offers a rich set of Style and Substyle settings, which I abbreviated in the early part of the file names. Note that five of these images have a Steampunk look. That's not just because I like Steampunk, but because with certain Styles, with this prompt, that's nearly all that was offered. I really like the produce market image. I consider it "most different" from the others.

We round out the experience by making up the error I made when I ran DreamStudio the first time. Here is the final montage:


With DS, one can add modifiers, and there are also 16 Styles you can set with a button, including "Enhance" and "Digital Art". Without a Style set, we get kind of clunky Steampunk, at upper left. With "Enhance", the results were all architectural! "Cinematic" yielded Steampunk vibe on a city-wide scale.

Looking at all these makes me wonder what training data was used for each of the programs. I also wonder if the preset Styles in some of them have training subsets associated with them. All I can do at present is to continue to experiment, so I get a feel for the kind of vibe or atmosphere I want a picture to have.

These programs all had very different training sets. Short prompts like the one used here bring this out the best. However, even very long prompts cannot force conformity. There is just too much latitude in an image for every detail to be pinned down. This is one reason I keep lots and lots of images from these experiments. When I am considering a picture I want to produce, I can look through my "collection" as an archive, a library of "looks".

But…nothing beats playing around with the prompt's wording, suffixes, modifiers, Styles, etc., etc. I'll show one more favorite, a bigger version of one of these last four:


It's kind of compelling…

No comments: