Wednesday, December 18, 2024

SI struggling to get cats right

 kw: generative art, experiments, cats, simulated intelligence

I spent an interesting hour several days ago "persuading" a few generative art programs to make an illustration for a story I planned to review. In the story, a ship cat (on a spaceship) in patrolling, as cats do, and sees someone (a saboteur, but the cat doesn't know that) release many tiny spiderlike robots from a bag. They are intended to disable the ship. The cat hunts them down and destroys every one (there's other things that happen to the saboteur). I tested various prompts, and settled on this one:

A white-and-black cat fiercely pouncing on small mechanical spiders in a spaceship cargo bay

My first tests were with Dall-E3 in Bing ("Bing Image Creator"). This image, the best of twenty or more images, was the closest to what I was looking for, up to this point.

There is just a one significant problem: The cat appears to be allied with the spiderbots, not attacking them.

Other than that, it's a very good image.

I usually find that Dall-E3 adheres most closely to the prompt. I wasn't satisfied, so I went on to Gemini, which uses Imagen 3 when you ask for an image. 

Imagen 3 produces only a single image, and that image is always square. If you ask Gemini how to ask for a wide format image, it gives instructions, but they don't work. If you just ask, "Please create the same image in a wider format," it will respond, "Sure, here you are," but it will produce another square image, usually very similar. Oh, well.

Here is the image Gemini offered. It was square, but with enough freeboard above and below that I could crop it to a 4:3 aspect ratio. It is the one I ended up using. There are a couple of anomalies, however. Take a close look at the paws. The left paw, that is the one raised, has the dewclaw much too far forward, like a thumb, and there is a sixth digit. Also, one of the digits on that paw, and the corresponding digit on the right paw, have double claws.

An image of a barefoot man that Dall-E2 produced for me a couple of years ago has six toes on one foot and seven on the other. These sorts of errors show that the programs aren't just modifying images of real cats or cartoon humans, but doing something deeper. However, they demonstrate that there is no understanding of the actual nature of cats or humans…or anything else, for that matter.

I went on to ImageFX, part of Google Labs, which I understand uses Imagen 3 also, but it seems to have more flexibility, including settings for aspect ratios. A slip of the mouse set the aspect ratio to Portrait for this one. I kind of like the tiny screens on the spiderbots. As with the Dall-E3 image, it isn't clear whose side the cat is on.

I tried again with a wide aspect ratio setting. This one came out 10:7. This is the only image of the entire set that has the cat in mid-pounce. But is it pouncing on the spiders?

If you look carefully you can see a number of flying spiders.

Hoping to get the cat attacking spiders without ambiguity (though the Gemini image is quite good), I turned to Leonardo AI. Bingo!

This image used the Leonardo Phoenix preset and the Moody style. The cat is not exactly pouncing, but at least it is looking at its prey. I almost used this image, but there is distortion in the hip area, and the even moodier tone of the Gemini image led me to choose that one.

In all, I generated about 100 images to get one I could happily use, and these few others that are "almost there," and interesting in their own right.

No comments: