Monday, March 04, 2024

The new artist on the block

 kw: ai art, generated art, artificial intelligence, simulated intelligence, comparisons, photo essays

A year and a half ago I began to use Dall-E2 as my "hired painter". A favorite pastime has been generating landscapes, particularly for use as Zoom backgrounds. This image is one I have used a lot:


The prompt for this was "A calming forest scene with wildflowers in a meadow, a stream, and a small pond, landscape painting." I don't recall how many times I ran the prompt, probably no more than twice, before I saw a 1024x1024 pixel square I liked, shown here. Then I outpainted (extended) it. The images Dall-E2 produces are PNG files, which are 5-7 times as large as a JPG saved with a 95% quality factor.

The result is 2624x1472 px, including the color bar used by Dall-E to identify its products. I cropped out a 2572x1447 portion, which is very close to the 16:9 aspect ratio needed for HD wallpaper. (As with any Blogger image, you can click on these to see them full size. The first image was reduced by about half from the original.)

Just a few days ago I got a notice from Bard, Google's version of ChatGPT, that its name was being changed to Gemini, and that it could now generate images. In the past few months I got access to a free version of Dall-E3 through Bing, and discovered another AI image program called Playground, that I've written about recently.

When I started to use Dall-E2 in 2022 I also tested the other two "legacy" AI art generators, MidJourney and Stable Diffusion. I found them more limited than DE2, and they are more expensive to use, so I ignored them since then. I recently took another look at MidJourney, but it runs as a Discord service, and I find Discord hard to use; it's also still too expensive. I'll mention more about Stable Diffusion in a moment.

I decided to test the products that I do use on the same prompt. Later I added another prompt, and we'll come to that.

First, I ran the "calming forest scene" prompt with Dall-E3 a few times, and picked the square shown here as the one most pleasing to me. DE3 doesn't yet do outpainting (at least not in the free version). Where the free version of Dall-E2 allows 15 free "Generate" steps per month, the free Bing version of Dall-E3 allows 15 per day. Running a prompt in Dall-E3 yields four square JPG images.

It is immediately clear that this image is more detailed, while retaining the look of a painting. Also, there is no color block or other "signature".

Both versions of Dall-E adhere pretty well to the prompt. Shorter prompts result in more variety. Prompt construction and editing become tools to negotiate with the product to get an image you want.

Secondly, I ran that prompt with Gemini. Since Gemini is also a chatbot, one must say, "Create a calming forest scene…", for example. You can ask Gemini for more suggestions, and get its help producing a prompt. Gemini also can produce four results per prompt, but sometimes it gives only two or three. At the moment, you cannot ask for human figures to be included; Google got in trouble when its early release of Gemini images yielded nearly all "minority" (non-Caucasian) faces.

The default size of Gemini images is 1536x1536, and they are JPG files. I reduced this one to 1024x1024 to compare with the other programs.

The level of detail is between that seen for Dall-E2 and Dall-E3. There is also a painterly look. I haven't tried asking for photographic detail.

Gemini claims that you can ask it to produce images of other sizes, between 256x256 to 1536x1536, and other ratios, such as 1024x576 (an HD ratio), but when I included size instructions in the prompt, I still always received 1536x1536 squares. Queried about this, Gemini said the capability was not yet there. I'd love to be able to produce 1920x1080 images from the get-go, but that has to wait.

Now, with Playground there are complications. Playground has a "side version" called Playground.AI that can produce images that are 1920x1080 and a wide variety of other sizes, but after doing about 20 prompts, a countdown reaches zero and you need to subscribe. I haven't seen whether more images become available after a month or whatever; I had other issues so I stopped using it. The "big version" at playgroundai.com has so many controls and options that it is hard to pick a single "default". For example, there are presently three Models, Stable Diffusion XL (they once included Stable Diffusion 1.5, but have dropped it), Playground v2, and Playground v2.5; there are also Samplers, which are mathematical methods used for the Diffusion operation, as many as 12 in the paid version and 8 in the free version; and there are dozens of Filters that affect the look of an image in ways ranging from subtle to dramatic; you can also turn on or off the random number generator used to produce a different Seed for each image. 

To simplify things for this experiment I let the Seed be random, I didn't use any Filters, and I used the following setups to produce five images, selected from groups of four under these conditions:

  1. SDXL (Stable Diffusion XL) with DPM2 Sampler
  2. SDXL with Euler a Sampler (Euler a is the default when you start to use Playground)
  3. PGv2 (Playground v2) with DPM2 Sampler
  4. PGv2 with Euler a Sampler
  5. PG25 (Playground v2.5), which doesn't use an explicit Sampler (of course there is a Sampler buried inside somewhere)

SDXL with DPM2. This and the following four images have quality and detail equal to DE3. This has a brighter look overall, including a little drama in the sky. I'll compare its siblings below with it.
SDXL with Euler a. Euler a in general has a softer look than DPM2.
PGv2 with DPM2. This is even brighter than SDXL, even a bit edgy, though the sky is more bland. I like the misty aspect of the background for all these, but this one is more pronounced.
PGv2 with Euler a. A little softer, as before, but this one also has a better sky.
PG25. This is even brighter than the others, almost too bright. All these five have a greater aesthetic quality than DE2 and Gemini, while matching DE3 in aesthetics though having a rather different feel.









They are all beautiful. But we are only half done here. All these art generators have the characteristic that they produce more widely diverse images when given very short prompts. Diverse not only from one product to the next, but from one image to the next in any product.

I happened to be reading a book about simulations in cosmology, so I decided to use a one word prompt: "Cosmology". This time I ran the prompt twice with each product. Here, for each product and variation I'll show all four responses to each issuance of the prompt. Dall-E2 is first:



These are screen shots of the image sets Dall-E2 produced. There's lots of variety from image to image, not just of subject matter but of style. Most of these have a focus, planets, galaxies, etc. The last image, at lower right, seems to have a wider scope, and most closely evokes "cosmology".

The next two sets of four are from Dall-E3, which displays its results in a block rather than a line, and with a black background:



Here, each set of four has a common theme, but the theme varies, as does the style, from one set to the next. Both sets have a galactic focus, but in the first set, two appear to host quasars.

Next is Gemini.



The first set is similar in concept to Dall-E2, with greater diversity. Its first image seems to best encapsulate "cosmology", having great scope. For the second set, Gemini produced only three images, all similar. One may click "Generate more", which I did, and it came up with two more images, one of which is shown here. The one not shown is different from all the others.

Now we turn to Playground, which produces groups up to four in a line. First, SDXL with DPM2 Sampler:



Six of these echo historical, pre-Enlightenment era, concepts of the Universe, in sundry ways. I'd say that the first image in the first set best illustrates the prompt. It seems to segue from Earth to infinity. Only the second image in the second set includes something vaguely like a galaxy.

Switching to the Euler a Sampler produced these:



Five of these bear some resemblance to the six "historical" images above, but the allover feel of these is different. None of these really fit the scope of the prompt.

Next we'll see PGv2 with the DPM2 Sampler:



These have less overall variety. The last image of the eight seems to go the farthest "out there". It's curious that PGv2 put people in nearly every image.

Now for the switch to the Euler a Sampler:



Wow! The first of the images would make a great bookplate. The rest are similar to the sets with the DPM2 Sampler, with the same tendency to include a person, if not persons. The third image in the second set is the closest of these eight to the "cosmology" concept, as I envision it.

Finally, let's see what PG25 does:



I still see a person or two, but all of these better approach the prompt in concept, with the eighth perhaps being the best. It is interesting that, with the one exception noted above, the images produced by Playground don't show things that look like galaxies. 

I see this collections of images as a catalog of "looks" I can refer to when choosing the way I want a generated image to appear. It is evident that Playground has the greatest variety of ways it can respond to a prompt, but that each of the other three art engines has something unique to offer.

No comments:

Post a Comment