Polymath at Large: An experiment with a versatile AI artist

kw: artificial intelligence, automatic art, evaluations, instructions, photo essays

I have been using Dall-E2 for a couple of years. Recently Dall-E3 became available at no cost in Bing. However, the free version doesn't have outpainting. I went looking to see which automatic art products do outpainting, and I found Playground. There are two versions of the same product, apparently aimed at quite different audiences. The one I use is at playground.com. It is also available by typing www.playgroundai.com, which can be confusing because the alternate version is at playground.ai!

As with any free version of a paid product, there are limitations. In this case, they are:

500 images (or "generate" actions) per day.
3 Canvases
You're running on a slower server

Actually, I think the last issue is that paying users get bumped ahead of free users in the server queue, so at busy times, you can wait quite a long time.

For more than simple image generation, there are a lot of controls! After getting basic experience, I decided to experiment with them. Here I will focus on one: Filters. These two images were generated without any filters, using the same Prompt and Seed, but different Models (more on all these soon).

Getting Started

You log in by connecting to a Google account (you must have one). Then you see this control screen:

I already produced a couple of images, to be discussed at the very end. To take the screen components in order:

At top left is the Playground Logo. Clicking the "v" next to it opens a submenu, including Logout.
So far, I ignore the three things at top right.
Below the Logo we see that we are in Board view. Images generated here are ephemeral and vanish when you log out. Images in Canvas are kept, and more things can be done to them. For our purposes here, I used Canvas in a limited way, which I'll describe soon. Since I downloaded all the images upon creation, I don't care if they vanish.
In that same row, there is a control to let you import an image. This is more pertinent to Canvas.
To the right of that, I have the Columns control set to two columns.
Down the left side, we first see Presets. This became available February 10 and I haven't used it yet.
Next we see Filters, which we'll be discussing in the bulk of this post.
Below that is the Prompt area. The prompt shown is "Gormenghast Castle"; the software doesn't like how the castle name is spelled, thus the red wiggleworm.
Next is Expand Prompt, which uses an AI engine (probably ChatGPT) to add a lot of words to the prompt you typed in. I haven't used it.
There are more controls available by scrolling, but they are not pertinent at the moment.
Down the right side, we first see Model. Three diffusion models are available, Stable Diffusion 1.5, Stable Diffusion XL (recently renamed this; it was formerly Stable Diffusion 2.0), and Playground v2. SD 1.5 is going to be dropped in a few weeks, so I don't use it. I used the other two for this experiment.
Next one may choose Image Dimensions. I usually use 1024x1024; I'll discuss why a different one is highlighted at the end.
Prompt Guidance is next. I usually leave it at 7 when using Stable Diffusion XL, and at 3 (the default) with Playground v2. Lower numbers give the Model more freedom, and higher numbers instruct it to hew more closely to the Prompt. This makes more difference when the Prompt is long.
Quality & Details settings will make an image look more or less detailed and "finished". Paying users can use more steps than 30.

The next image shows more controls found in the right side:

Refinement adds fine detail. Although I can set it higher than 15, Playground doesn't seem to like that; Generation times out. It probably works better for paying users.
The Seed is all-important. When producing multiple images (4 is the max in the free version), the "Randomize" box must be checked. When a single image is selected and the box is unchecked, a seed that's about to be used is exposed. You can type or paste in a seed up to 9 digits in length. I suppose that means that any particular prompt can produce 10 billion variations.
The Sampler has a number of options, which I hope to experiment with on another occasion. "Euler a" is the default, and I haven't yet tried others.
Below, off-screen, is a Public/Private option which isn't available in the free version.

Getting the Seed

To get started, I had to determine a Seed. When you generate images in Board view, you don't have access to the Seed used. Images generated in a Canvas have the Seed shown when you click on them. Thus, I generated four at a time using the Stable Diffusion XL model in Canvas, tinkering with a simple prompt, until I saw an image I liked:

The Prompt I settled on is "Small town near a winding stream".
The Seed for my favorite is, as shown in the image above, 802745251. I used this seed for all the images shown below.

Back in Board view, I tested the Seed to be sure I had what I wanted. I saved the resulting image with the file name "901 Stws SDXL None", according to a numbering scheme I worked out beforehand. SDXL is my abbreviation for Stable Diffusion XL. I changed the Model to Playground v2 (PGv2 is my abbreviation), generated again, and saved the file with a similar name; this image pair (the same as that shown near the beginning of this post) is the result, clipped from an Explorer window.

The two images have similar composition, but differ quite a lot in detail. The two Models have different things they "pay attention" to.

Producing the Images

Now that I had a Prompt and a Seed ready to go, I proceeded to use filter after filter. First, there are 15 Filters that work only with SDXL. I used screen clips from an Explorer window with the View set to "Extra large icons", showing the icons in three columns. I gathered them by sixes. Comments follow each group of six.

The image file for the first Filter is numbered "02" because I used "01" for "None", and later renumbered it "901" to get the Explorer window to sort the way I wanted. These are all from a higher viewpoint than the "None" version. The icons in the Filter selector make it clear that they are designed to affect portraits the most.

Three of these resemble the first six. "Mysterious" has a look I like for some purposes, and it is back to the streamside viewpoint. "Niji SE" is also closer to "None" in its viewpoint, while "Pixel Art" shares the elevated viewpoint of most of them, but is blocky, as we would expect.

These last three are more similar to the first eight. Of all those with an elevated viewpoint, I like "Counterfeit" the best.

Now for the SDXL - PGv2 pairs for the other 23 shared Filters (prefixed A through W). They are presented in groups of four images, or two Filters per group, shown larger than those above. The last set has a bonus.

It is interesting that "Vibrant Glass" with SDXL has removed the village. With PGv2 it put the glass in the stream. The "Bellas Dreamy Stickers" Filter frequently puts a border around the image with SDXL, but seldom does so with PGv2. Its images would make good bookplates or logos.

"Ultra Lighting" produces a very pleasing look. "Watercolor", when looked at in detail, has a very definite painted appearance. Either of these images could be cropped, perhaps to a 4x3 ratio, doubled or tripled in resolution using a product like Upscayl (the one I use), printed and framed, and hung on your wall. Or it could be moved to Canvas for outpainting, etc.

With many prompts, "Macro Realism" produces extreme closeups that may or may not resemble the output of other filters. Here it only bows to the Macro world by having a narrow depth of field, especially with PGv2. "Delicate Detail" sometimes makes a big difference, sometimes not. Here it is similar to some of the others, but otherwise unremarkable.

"Radiant Symmetry" will sometimes produce very symmetrical mirror-image views; these are only approximately symmetrical. "Lush Illumination" is another of my favorites. Note that some areas of the images differ a lot between the two Models, but the cloud is very similar.

"Saturated Space" usually has a SciFi look. I note that SDXL put a boat in the water, while PGv2 put spaceships in the sky. I am not sure what the goal of "Neon Mecha" is, but at least in PGv2 it seems to evoke technopunk.

"Ethereal Low Poly", where "Poly" means "polygons", evokes older renderings on computers with limited memory. "Warm Box" is most similar to "Lush Illumination", with perhaps more geographic exaggeration. We're halfway through these…

"Cinematic" and "Cinematic Warm" evoke the look of movie sets.

"Wasteland" is, of course, saying, "Industry overdid it." "Flat Palette" uses a restricted range of hues, for a rather painterly look. I see that, while many of the Filters straighten out the "winding stream", at least a bit of curve is found here.

"Ominous Escape" seems similar to a couple of others. I don't know what it is for. "Spielberg" attempts to evoke film sets used by that great director.

I am not sure what is the point of "Wall Art". For many of the prompts I've tried, the image is dominated by an exaggeratedly ethnic face, most frequently Black. Both of these images really could become a picture in your dining room. "Haze" is just that: a foggy morning look.

The last Filter is "Black and White 3D". What it does is obvious. Less obviously, with SDXL it produced a European look.

The bonus is the result when I tried the same Seed but no Filter with the Prompt "Gormenghast Castle", a stray memory from reading the oblique novel Titus Groan decades ago (read it at your peril; it ranges from obtuse to profane). The castle is supposed to be so huge nobody can explore it all in one lifetime. I see that in my haste I put STWS rather than SDXL in the file name on the left. This image looks like it could be a book cover. In the PGv2 image it looks more like I imagine Gormenghast Castle could have looked.

Way, way back near the beginning of this essay, in the screenshot of Playground, the SDXL image of the castle sits next to a taller image with a different look. The only thing I changed was the aspect ratio of the output image. I was hoping for a "book cover" look that is closer to the shape of a book, but it was not to be. That's the breaks when you commission an AI Artist that has a few billion (or trillion) little decisions hidden inside its operation.

The folder of image files I created for this post is a good reference I'll use when I have a particular "look" in mind for an image.

At least in terms of physical space taken, this may be my longest blog post!

Polymath at Large

Monday, February 12, 2024

An experiment with a versatile AI artist

Getting Started

Getting the Seed

Producing the Images

No comments:

Post a Comment