kw: ai experiments, simulated intelligence, automatic art, comparisons, generated art
I like caves. In the post Troglodyte Fantasy I reported on a project to generate images of about a dozen rooms built into cave spaces, using two different art generators. I experimented with several others, and I conclude that the various programs vary significantly in how much they conform to or comply with the details of a prompt. I used long prompts in particular for this project. Here is the first one, which was intended to reify my ideas for a "man cave":
A room in a spectacular cave that has many stalactites and stalagmites, with flowstone on the room's walls, fitted out as an office with a desk and chair and desk lamp and two large computer monitors, with a bookshelf full of books to one side and two smaller side chairs.
Note that the room inventory is one desk, one desk chair, two side chairs, a desk lamp, two computer monitors, and one bookshelf. The milieu is a cave as described.
So far, I have produced images for all the rooms using three programs: Leonardo AI, ImageFX, and most recently DreamStudio. I also produced several versions of the Cave Office image using Gemini, Dall-E3, and Playground. The degree of prompt compliance these programs exhibit is quite variable, both from program to program and within the various "styles" or other toolsets of a program. I show some findings below, first for the four programs that I managed to "persuade" to hit nearly all the goals. Here is an image montage:
DE3: Dall-E3 – Everything is there, plus an extra bookshelf and several extra lamps in addition to a tiny desk lamp. We also see a view outside the cave through an archway.
DS: DreamStudio – There is only one side chair. There is a bonus monitor and floor lamp. However, it took the production of dozens of images to get this one.
GEM: Gemini – No side chairs, but everything else is there. The desk lamp is off, and the cave in general is the darkest one of these four. This was cropped from a square image.
IFX: ImageFX – Everything is there, plus an extra bookshelf and extra desk lamp. I understand that both Gemini and ImageFX use Imagen 3 to generate images, but there must be different training sets in the background.
The other two programs have numerous "style" settings, so in the second montage I showcase two variations for each program:
Leo: Leonardo AI. On the left, style and substyle "Phoenix" and "illustration", which explains the drawn appearance. Everything is there, although the two lamps stand beside rather than on the desk, so there is no real desk lamp. I am not sure what the green tree in the corner is doing there! "Phoenix" is billed as being extra-compliant to prompts.
On the right, style and substyle "Lightning" and "vibrant", so color and contrast are enhanced. It's hard to see where a second monitor might be. Everything else is there, with added chairs and tables and table lamps, like a mini-conference sidebar. Note that Leonardo AI has various levels of credit usage for different styles, and Phoenix costs 2.4 times Lightning, while most other styles cost 1.4 times Lightning, which is promoted as fast and cheap.
PG: Playground. On the left, using the SDXL (Stable Diffusion XL) engine, probably version 1.0. There is only one side chair, but an extra bookshelf opposite, and a smaller bookshelf at the far end of the room.
On the right, using the PG30 (Playground 3.0) engine, which is billed as "very compliant to prompts". That is apparent here. Everything is there, with nothing extra. Sadly, Playground has dropped its image generation interface and announced it is going into graphic design. I'll miss it. It had the most options, but a big learning curve.
This doesn't get very deep into the use of these programs. At present the only program I have paid into is DreamStudio, because they have a pay-as-you-go plan, similar to the one Dall-E2 had. The others have various subscription plans, which I avoid. I haven't tried editing or outpainting with any of these except Playground.
It is likely I could edit an image to add something I think is missing. But I prefer to get an image that is closer to what I want from the start, so little or no editing is needed. In the past I used outpainting to turn a square image into a wide-format image. That is not needed now, except for Gemini, but when asked for "wide format" it produces an image a little zoomed out so you can crop it, and its original images are 2048x2048, which helps.