Monday, March 25, 2024

I am so glad to learn that Copy Editors still exist

 kw: book reviews, nonfiction, copy editors, copy editing, punctuation, word usage, grammar, memoirs

Mary Norris is one of those wonderful people who make sure that "everything that's fit to print" is also "fit to read". Her 2015 book is Between You and Me: Confessions of a Comma Queen. She is a long-time copy editor for The New Yorker. One of her mentors actually had a "comma shaker", a decoration at her desk, to emphasize that the style favored by TNY incorporates plenty of commas, but not too many. A big bone of contention between different schools of copy editing is the serial comma. Some know it as the Oxford comma, apparently because at one time it was universally used in publications at Oxford (not so much, these days, I understand). 

And just what is this mysterious comma? It is the one used after "and" in a series such as "red, white, and blue". Most publications these days don't use it, preferring, "red, white and blue". However, while it is OK to leave the comma out of that particular list, consider the sentence, "The singer was accompanied by his two ex-wives, Kris Kristofferson and Waylon Jennings." Here, a comma before "and" would make it clear that the two men mentioned are not former wives of the famous singer! The list refers to four persons accompanying him.

So, copy editing. You might think the subject is going to be dull. I assure you, it is not. In this author's hands, it's delightful.

Any memoir begins with an obligatory chapter or two about one's early life. By the end of Chapter 1 the author has been accepted into the august halls of TNY, and she can get down to the business of regaling us with stories of solecisms and how they are exterminated…or not. Here we meet the formidable Lu Burke and her comma shaker, and others who guided Ms Norris's career, sometimes catching her slips, and sometimes grinning in chagrin when she catches one of theirs. Several pairs of eyes scrutinize a manuscript or galley proof before it ever hits print, the work's author (and often several friends and relatives), the intake editor (the one an author might think of as "My editor"), the typesetter (although these days most authors submit electronic text, sidestepping the typesetter), the copy editor, a supervising editor, and someone in the print shop who at least glances over the text while making sure the layout is proper.

Many of the details of a copy editor's job, as described throughout the book, are relevant to a time before word processing and the ubiquitous PDF file: Lots of hand markup, and markup of markup, and an author's "STET" where he or she doesn't want a certain change to be made (I had to do a ton of STET's while negotiating with a copy editor for a British journal, who tried to de-Americanize my text, while I wished to retain my own voice, not sounding like a warmed-over Brit). It isn't clear how does all these revisions and re-revisions electronically. I should ask my brother, who publishes a book every few years!

[For this image I had no success getting Playground/Stable Diffusion to show a comma shaker, but a different query popped out this kettle of question marks]

The author's career at TNY really began when she caught an error all others had missed: the word "flour" spelled as "flower". Homophones are hard to catch, and are so far impossible for "spell checkers" in software to detect. They're one big reason we still need copy editors. Some chapters are devoted to grammatical solecisms ("solecism" is the grammarian's term for "sin"). Others to punctuation marks and their use/misuse. She alternately rhapsodizes and agonizes over the way one must edit poems by Emily Dickinson, who used dashes of several lengths—sometimes half a line—in favor of most other punctuation. Opinions vary as to how many "buckets" to use for her dashes. The usual set, more than plenty for most of us, numbers four:

- Hyphen, also used for a preceding minus sign, sometimes.

– En dash (the width of a capital "N"), usually used for the minus sign; it's the same width as the +.

— Em dash (the width of a capital "M").

—— Long dash, which doesn't have a code in Unicode, but is entered as two Em dashes strung together. It runs the risk of the two Em dashes being separated at the end of a line.

Plus, there are rules at TNY about when to surround a dash with spaces, and when not to do so. Other punctuation marks that get lengthy treatment include the semicolon, colon and apostrophe. A few pages are devoted to dangling participles, such as, "Over tea in the greenhouse, her mood turned dark." It wasn't her mood that was hovering over the tea, so the sentence was rewritten, "As we drank tea in the greenhouse, her mood turned dark." That made it clear that one of the tea drinkers possessed the darkening mood (or was possessed by it).

In these reviews I have noted when I've found more than one or two evident errors in a book's text, and I have complained about the evident lack of competent copy editors (often, total lack…). I note that I have used an ellipsis (…) a couple of times. Ms Norris tells us of a writer who uses the ellipsis almost to the exclusion of other punctuation marks. I lends a breathless quality to the writing, and one hardly knows where to pause and recollect one's thoughts. It also leaves a copy editor quite unable to do much.

I cannot close without adding my 2¢ to the chapter on "you and me", which these days is so frequently replaced with "you and I", as in, "My brother called for you and I." Hardly anyone would write, "My brother called for I." Rather, we would write, "My brother called for me," which is the clue that the pronoun is the object of "called", not the subject. A side issue is that, while my generation was taught to always mention oneself last, that practice has also been gradually eliminated. Now we hear, "such-and-such happened to me and him," for which at least the pronouns are back in the correct (objective) case. I was taught that anything other than "…to him and me" is impolite. My 2¢? AMEN! The book ends with the author, at a memorial event for Lu Burke, having mentioned "Alice or me." Correct, right up to the finish.

Wednesday, March 20, 2024

...and just who might be listening back?

 kw: book reviews, nonfiction, SETI, extraterrestrial life, philosophy

Seth Shostak is an idiot…or to be more charitable, he is dramatically misled by his own idealism. He is a very prominent proponent of sending messages toward possible alien intelligences, and he is the senior astronomer for the SETI Institute. He minimizes the possible risks we face from becoming known to "the Universe". If someone is out there listening, how will they react to learning that we exist?

In The Contact Paradox: Challenging Our Assumptions in the Search for Extraterrestrial Intelligence, author Keith Cooper lays it out plainly. He sums it up nicely in a paragraph on page 295:

We search the Universe for evidence of extraterrestrial life to make contact with others, for humanity to be able to share the Universe with others. Yet we find ourselves in a position of not being confident about whether we should try and make contact.

We dream of learning wonderful, life-changing things from superintelligent, hyper-advanced space aliens, or ET's (ExtraTerrestrials). All too often, the idealists in particular ignore the fact that every human being is both good and evil. Under some circumstances, we are altruistic, even heroic. Under others, every single one of us is capable of murder and larceny. There are no exceptions. Can we expect anything better of the denizens of another solar system? What are the chances of any intelligent species that arises due to evolutionary processes becoming unfailingly altruistic not only among themselves, but toward others to whom they are not remotely related?

The author makes a deeper point: by searching for "others" we search for ourselves. We project our hopes and dreams on them. It reminds me of a Chinese parable:

A man prayed daily for the Dragon to come. He dreamt about meeting the celestial being, imagining the wonderful things he might learn. One day there was a knock at the door; really, more of a crashing sound. The man opened the door and he saw him: scaly, fiery orange and red, forty feet long, with eyes the size of saucers and teeth like daggers. He screamed in fright, "Who are you?" The voice hissed and roared through him, "I am the Dragon. Am I not what you wanted?"

More succinctly: Be careful what you pray for; you just might get it.

A rather unbalanced segment of American society (few people elsewhere are as enamored of ET's as Americans) lives in combined fascination and fear of "flying saucers" and the "space aliens" that might "abduct" them to do "genetic experiments". Such sexual anxiety says a lot more about these people than it does about ET's. Is it really possible that Earthly genetics can have anything to do with ET's?

Sidebar: People with biological education (whether schooled or self-taught) usually know the Central Dogma of Genetics: DNA→RNA→Proteins. More recently, it has become clear that more than half of the DNA for which we know a function does not follow this dogma directly, but is regulatory, and modifies what happens when a "coding gene" is expressed to RNA and then to a protein. The key to how this works is the coding table, or The Code, which translates 64 codons into 20 amino acids plus a "stop" signal.

A little known fact about The Code: There is one Standard Code for the nuclear DNA of nearly all eukaryotic organisms, and it is also used for the DNA of most prokaryotic organisms (bacteria and archaea, which are bacteria-sized). But on Earth there are 25 other codes! See the details at this NCBI site. A few of these alternate codes are used by simple protozoan creatures, while many are for the mitochondria in various eukaryotes, and the rest are for various families of prokaryotes. This is just for Earth life. How many alternate codes are possible? Will any aliens out there have DNA like ours, and if so, will it be based on "our" Code?

The calculation P[64:20] yields 2.1x1036 permutations. In common terms it is 2.1 trillion trillion trillion. That is one estimate of the number of possible DNA-to-Protein Codes. However, while there are as many as six codons per amino acid, the codons are grouped, so the number of efficient DNA codes may be smaller by a factor of a billion or so. Even then, we are left with a 28-digit number. It is extremely unlikely that an alien species from any other star system will have any genetic similarity to Earth life. Further, there is no guarantee that the same 20 amino acids will be used everywhere. It is actually more likely that, of the hundreds of possible amino acids, there would mostly be species with hardly any proteins that are "compatible" with any of ours. The aliens may not even be able to eat Earth foodstuffs (ourselves included!).

The book has an illuminating and comprehensive history of SETI and a detailed discussion of "things to look for" besides radio signals: flickering laser or maser light, "biosignatures" (such as the presence of both oxygen and methane together), and technosignatures other than radio (such as anomalies in a star's light caused by immense clusters of solar arrays). There is no need for me to get into detail; it's hard to beat the author's writing!

The book is a great joy to read. We have a lot to think about as we consider Who Is Out There.

Monday, March 11, 2024

Creeping toward the Matrix

 kw: book reviews, nonfiction, cosmology, simulation, modeling

I was for several years, in my career of writing scientific software, the leader of a "Modeling and Simulation Group". One of our products had three sections, simulating first the geochemistry of crude oil generation from organic matter in deep rocks (up to a few kilometers), then the upward migration of the petroleum liquids through porous rocks, and finally their entrapment against nonporous, or less porous, rock layers to form oil and gas reservoirs.

I was sent to a few exploration offices to show off the software. In one instance, after the geologist set up access to a set of grids based on seismic data, I ran the software, which displayed the progress through time of oil and natural gas collecting under the trapping layer some half-kilometer beneath our feet. At the end of the run, he pointed to one green blob on the map, saying, "This is X field," and to another, "This is Y field." Then he pointed to a third one between them, asking, "But what is that?" I answered, "That could represent a lot of money." As it happened, the company had decided to sell that property to another company. That oil company made the money! But the software found the oil before the property was drilled. Later that year I spoke about the experience at a Research Review. My talk was titled, "Finding Oil in a Computer."

It was with great relish that I read The Universe in a Box: Simulations and the Quest to Code the Cosmos by Andrew Pontzen. If you buy the book feel free to download this image to print a bookplate; it's 1024x1024 px. Use Upscayl or something similar if you want it rendered at higher resolution. I produced it using Playground AI; the only prompt was "Cosmology". I tinkered with Samplers and other parameters, looking for something else. Getting this image was a side benefit.

The author could have delved deeply into sundry technical issues—there are many! Instead, he has skirted these, providing just a taste of some of them, in favor of the philosophy and motivations for making computer simulations of natural phenomena.

The terms "simulation" and "modeling" have overlapping meanings. In principle, a Model is the framework and the sets of parameters that define the physical structure and the physics rules to be followed, while a Simulation is the operation of the Model over a chosen span of time, producing a series of output data sets that describe the expected physical state of the modeled "thing" at one or more points in time, whether past or future. Note that a simulation can be done on equipment other than a computer. One story of the book is about a galaxy simulation done with light bulbs and photocells and a glorious tangle of wires.

Weather forecasting is one very visible result of computer simulation, seen daily (or hourly) on newscasts and in the various weather apps on our devices. There are a couple of dozen important models used by weather agencies the world over. One expression of these is the Spaghetti Plot of a hurricane's forecasted track, as produced by several models. The models differ in the importance they place on various aspects of the modeled system, including whether it represents the whole Earth or a hemisphere, or a couple of continents.

All weather models are based on a global General Circulation Model, in which the atmosphere and the land and sea surfaces in contact with it (and sometimes a surface layer of the ocean) are divided up into roughly ¼ to ½ million quasi-rectangular portions. Half a million to a million "cells" is about the most that modern supercomputers can handle. In general, spatial resolution is likely to be as large as 200x200 km! The Earth's surface area is about 127 million km², and the models have between 20 and 40 vertical layers (at present). A 40-layer model would have more than five billion cells of 1x1x0.5 km, so to get the count below one million requires using cells with an area of more than 5,000 km², which is 71x71x0.5 km, and most models are set up for grid squares of about 100x100 to 200x200 (and half a km thick). The physics rules, primarily those relating to pressure and temperature relationships, are applied at the boundaries between grid cells.

To get a "future radar rain" map with finer detail requires using "sub grid" rules, and partial simulations over short times and restricted areas, a subject that Dr. Pontzen discusses. Compared to Earth, the Universe is immensely more complex, and the problems of building an appropriate model and running simulations that may span billions of years, but don't take billions of years of computer time, are truly tough!

For example, consider a galaxy. On the scale of the while Universe, galaxies are tiny and far apart.


This is the Hubble Ultra-Deep Field image, which shows about 10,000 galaxies (only one star is in the field, the really bright point with spikes caused by diffraction). The area of this image on the sky is 0.038°x0.038°, or about 0.15% of a square degree. It is about the size of the smallest thing you can see with your eye.

To a very rough estimate, although galaxies vary a lot in size, the really big ones seen here are a lot closer than the really small ones. The six or eight largest ones in this image are seen to be far from one another. If their intrinsic size is a little smaller than the size of "our" galaxy, the Milky Way, they are about 50,000 light-years across, and the average spacing between them is one or two million light-years. But larger-scale observations reveal that nearly all galaxies are strung out along strands in an immense web, with voids that contain no galaxies at all but span hundreds of millions of light-years.

One problem of computational cosmology that the author dwells on is that it is really hard to produce a cosmological simulation that doesn't result in a much larger number of galaxies. According to most models, this image "should" contain so many galaxies that there would be very little black space seen between them! A conundrum of computational cosmology is, "Why is space so empty?" I suppose all I can say is, "Stay tuned." I await a follow-on book on the subject as more is learned!

The smallness of galaxies compared to the intergalactic web, and the incredible smallness of the stars that make up the galaxies, and even more amazing smallness of planets, moons, and everything "solid" that we are familiar with, produce a huge problem of "stiffness" in any kind of simulation that seeks to span the entire range of sizes. Mathematical equations that drive simulations are called differential equations (DE's). By their nature, DE's produce one or more side effects, which mathematicians deal with using various schemes, and such schemes are embodied in the computer codes that run simulations. However, these schemes are seldom perfect, and runaway effects can swamp the simulation if it is run outside of a carefully chosen range. If a simple simulation includes two processes, and one runs 100 times as fast as the other, it is necessary to cater to the faster process or the results blow up. This time-scale contrast is called "stiffness". One must use time steps shorter than the time scale of the faster process, even though during such short steps, the slower process doesn't do much. Now consider what happens if the time scale varies over a range, not of 100 to one, but millions to one, with numerous processes all across the time spectrum. Not only that, if 99% of the volume is empty, and the remaining 1% has similar ranges of "spatial stiffness", the problem compounds dramatically. A lot of the book deals with such things, but using more accessible language.

The author also discusses dark matter and dark energy. Dark matter is probably quite real. It is needed to keep the stars in their orbits about their galactic centers, because the visible mass is not sufficient. This is not a small effect: the "extra gravity" needed is about five times what would be exerted by all the visible stuff we see. The current theory is that 70+% of the matter in the Universe isn't affected by electromagnetic radiation, so we can't see it. Scientists are working hard to find out what kind of stuff could be so invisible but so heavy.

Side question for the author or other cosmologists who may come across this review: Do black holes consume dark matter that encounters them?

Anyway, dark matter and the properties we infer for it must be included in cosmological models for their simulations to make any sense.

Dark energy is the term applied to an odd effect seen when very distant supernovae are studied. They seem too dim. Their distances are determined from the redshift calculated from their spectrum and, if possible, the redshift of their host galaxies. There are distinct "lines" in the spectrum of any astronomical body that allow us to determine its composition and the speed with which it is moving, radially at least. The Hubble Constant (named for Edwin Hubble, not the space telescope which was also named for him) characterizes the velocity-distance relationship.

Determining the actual brightness of a distant object is not straightforward. Dust and gas in and between galaxies absorbs some light. The relationship between distance and "intergalactic extinction" ("extinction" to an astronomer means light is being absorbed) is thought to be well understood. When such calculations are applied to certain supernovae, a discrepancy is found between how bright they are and how bright they "should" be. The farther away they are, the greater the discrepancy. This indicates that they might be farther away than their redshift would indicate; the "Hubble Constant" would then be not so constant! This implies that cosmological expansion is speeding up, not slowing down as we would expect.

I personally look at two matters that need more study before I will seriously consider that dark energy is real. 

Firstly, it is not mentioned in the book that the kind of supernovae one must study to discern dark energy are Type 1a. They are produced by a special mechanism. Most supernovae result when a large star (8-20x the mass of the Sun) runs out of fuel and its core collapses. About a quarter of supernovae result from a white dwarf star being loaded up with matter from a nearby red giant that is shedding mass. The maximum mass of a white dwarf is 1.44 solar masses; at this point it collapses and erupts as a Type 1a supernova. Because of these mechanics, Type 1a supernovae have very similar maximum brightness, making them a "standard candle". However, I have looked in the literature for an indication that the composition of the white dwarf and/or its red giant companion might affect the brightness of a Type 1a supernova. In the very early Universe there was hardly anything except hydrogen and helium. The first supernovae were all Type 2, when large stars, that had been forging hydrogen into more helium, and then forging helium into heavier elements, up to iron, exploded. Over time, the abundance of heavier elements in the Universe increased. To astronomers, all elements from lithium on up are called "metals" for convenience. Metallicity is a measure of the percent of "metals" in a star or galaxy. Our Sun's metallicity, at its visible surface, is 1.3%. Its age is 4.5 billion years, and it has not undergone fusion reactions that could change its metallicity, but an unknown amount of interstellar "stuff" has fallen into it; this is probably quite small in proportion to its total mass. Thus, a little over 1% probably represents the metallicity of this part of the Universe 4.5 billion years ago. The metallicity of the stars in a galaxy varies with distance from the center also, but not over a huge range. The bigger difference is seen between "Population I" stars, that are younger and have higher metallicity, and "Population II" stars, that are older and have something more like the metallicity of the Milky Way when it first formed, perhaps 10-12 billion years ago. This is roughly 1/10 or less of our Sun's metallicity, or less than 0.1%.

Very early galaxies and their stars had very small metallicities, ranging from 0.001% down to nearly zero. Therefore, so do the earliest Type 1a supernovae. A question I have not seen answered:

We know that white dwarf stars are composed primarily of carbon and oxygen. They are known to have some metals, because they are diagnosed by lines of silicon. BUT: Is the peak brightness of a Type 1a supernova significantly affected by the proportion of elements heavier than oxygen?

Secondly, is it possible that dark matter interacts very slightly with electromagnetic radiation? Simply put, the Universe's age is considered to be 13.8 billion years. At an age of 1.38 billion years, its "size" was 1/10 of its present "size", and the concentration of both ordinary matter and dark matter would have been, on average, 1,000 times greater. Somewhere along midway, say at an age of 4.4 billion years (the square root of 1/10 times 13.8), the "size" would have been about 0.32 of the current size, and the concentration of both ordinary matter and dark matter would have been about 32 times greater than at present. If there is even a slight interaction, "dark matter luminous extinction" could be a genuine effect, yet we would be very hard put to determine whether the dark matter that must be all around us has a measurable influence on light.

For the time being I consider that it is much, much more likely that "dark energy" is a phantom, and will eventually be found not to exist.

That is a significant digression from the discussion of the book. The author discusses the utility of cosmological simulations of various kinds. They aren't just a way for us to have a "pocket Universe" to play with, but they help us understand what might have occurred at various stages of the evolution of the Universe, or of groups of galaxies, or of stars and star clusters. Unlike weather forecasting, Universe simulation focuses on retro-casting, trying to reproduce how things worked out over some interesting span of past time, whether measured in centuries, millennia, or billions of years. To know where we really are we need to know what came before. Looking at distant things, as the Ultra Deep Field does, lets us look back in time. Things were different way back then, and computational cosmology is a powerful tool to help us understand it all. We've made a bit of a start; we're just getting going!

The author also asks whether it is plausible that we are living in an über-simulation inside some super-Matrix run by super-beings. He gets into that because he gets asked about it frequently. I'll mention one thing that he does not: one human brain has complexity of the same scale as a good chunk of the non-human Universe, and all of us together are more complex than the whole rest of the Universe (unless there are lots and lots of alien species!). In the Cosmos series by Carl Sagan, decades ago, it was stated that there are probably 100 billion galaxies in the observable Universe, with an average population of 100 billion stars each. The number of galaxies is probably more like a trillion. The number of stars is thus a number with 23 digits.

What's in a brain? The cortex has 16 billion neurons and the cerebellum has 70 billion. Each neuron has about 5,000 connections to other neurons. The 100 billion smaller "glial cells" also contact numerous neurons and large numbers of each other. The number of connections is thus a number with 15 digits. The number of humans is about 8 billion, a 10-digit number. So the "total human connectome" is about 100 times as great as the number of stars in the Universe. Another number of similar size is the number of molecules in 18 grams of water (a quantity known to chemists as a "mole"), which is a 24-digit number starting with the digit 6. If one could somehow use each water molecule in a tablespoon of water as a computer bit, it would take ten tablespoons to have enough molecules to devote just one "bit" to each connection in the sum total of all human brains. That's the bare bones of what's needed to produce The Matrix. And that's just one intelligent species on one planet. I'd say that if Moore's Law gallops along unimpeded long enough (but it won't, it's already faltering), it would take hundreds of doublings, or at least 1,000 years, for a big enough, fast enough computer to be produced (by Someone) that could simulate the entire Universe in real time. Of course, by making the computer's time steps for each second of real time actually take, say, a century, a much smaller computing system could to the work. How could we tell? Dr. Pontzen doesn't know, and neither do I.

A very enjoyable book. You don't have to be a computer geek like me to understand it.

Saturday, March 09, 2024

Guidance Parameter in Playground AI

 kw: experiments, ai art, generated art, artificial intelligence, simulated intelligence, comparisons, photo essays

Another parameter to explore in Playground AI is Guidance. This influences how closely the generated image conforms to the prompt, so they say. I decided to find out. In earlier experiments I had kept a few images I particularly liked. One had the seed 260650348, and I decided to use that for this experimental project, and to use only the Euler a Sampler.

The three Models have very different sets of Guidance parameters:

  • Stable Diffusion XL (SDXL) has levels from 0 to 30, and Guidance above 30 is available to subscribing (paying) users. The default is 7 and in FAQ's they recommend primarily using between 7 & 10. After some pre-work I decided to use 2, 4, 7, 11, 16, 24, and 30.
  • Playground v2 (PGv2) has levels from 0 to 5. The default is 3. I determined that levels 0, 1, and 2 produce identical results, so I decided to use levels 2, 3, 4, and 5.
  • Playground v2.5 (PG25) doesn't use a Guidance parameter. It also doesn't have multiple Samplers. It's a "point and shoot" generator.

I've learned from others' reviews and some "help" YouTube videos that longer prompts give the software more to work with. It stands to reason that there could be a greater difference among Guidance levels with a long prompt, compared to a short one. I decided to test four prompts of a wide range of lengths; the word counts below are "meaningful" words, ignoring articles:

  1. 1 word: Cosmology
  2. 5 words: Quaint village near a mountain stream
  3. 13 words: A rocky beach grading into a sandy beach below sea cliffs beneath a partly cloudy sky
  4. 28 words: Fantastical clock with a big dial for the time using roman numerals, the second hand on a small dial of its own, and indicators for month and day and phases of the moon

I'll present the resulting images half size (512x512) in pairs or groups of 4, beginning with PGv2 and Prompt 1.



A number of trends are seen as Guidance (G from now on) goes from 2 to 5:

  • The sky arch begins with a look like a multiverse, and goes to more of a dynamic universe look.
  • The observer is bigger at G4 and 5, while the child seen at G2 turns to a rock which progressively shrinks.
  • Trees appear at G3 and move around.
  • The nebula of G2 gradually turns into a galaxy.
  • Sundry planets come and go.

However, there is no dramatic change in the overall look of the image.

Next, 7 images from SDXL, plus one by PG25.





The SDXL images all have a Medieval look to them. The two that look best are the third and fourth, with G07 and G11. Above G16 they kind of go off the rails. At G30 in particular the frame is quite detailed, but the rest of the image has lower quality, as the FAQ warned. The PG25 image is quite fetching, similar to the central portion of the PGv2 images, with a kind of swirly surround. This one could be fun to run a bunch of with random Seed turned on.

Now for Prompt 2, the village by a stream. PGv2 first:



As before, these are all very similar, with added details at each increased G level. Next, SDXL and PG25.





The frame seen in the earlier series is still with us. Here, the overall look gets a dramatic overhaul after G11. The image for G16 has a bookplate look, while G24 and G30 seem to emanate from confusion, perhaps due to conflicting requirements.

The PG25 image is very pleasing, similar to any of the four PGv2 images, but more detailed and dramatic. Next, the beach scene, PGv2 first.



The differences between these are a matter of increasing detail. I note that the main cliff attains an overhang in the fourth image, and while it looks like the sun is higher, it's just that the second headland is lower, with a notch in it. Now for SDXL and PG25.





I see that these retain the frame. The first two images, at G02 and G04, are from a high perspective; it would have taken more words to specify where eye level is. The next one, G07, is about what I had in mind. The fourth image, at G11, is very good and G16 is almost as good, if a little exaggerated. After that things go downhill, and the frame is even breached.

PG25 has a very good look, with more diverse scenery than the PGv2 images. Now for the final series, the clock, beginning with PGv2.



I had something in mind when I wrote Prompt 4, which I'll get into below. Only the fourth image, with G5, appears as if it could be a real clock. All four of these have the "smaller dial for the second hand" concentric with the main dial. That's not what I had in mind, but I didn't specify "next to" or "below" the main dial. Now for SDXL and PG25.





It's pretty clear by now that, however many words one uses, the best range is usually from G07 to G16. The first two images are rather primitive, and the last two go wonky. I suppose the best is at G07.

PG25 has produced an entire clock, not just a dial in a frame. It still doesn't meet all the criteria. 

Here is what I had in mind, a clock with a moon dial and a separate second hand dial above the center. The day indicator is in the square window; numerous variations on showing days have been produced. This is a modern dial, in a style going back 150 years.


Had I specified "many dials" I might have expected something more like this, a French clock from the Louis XIV era. The "dial" at the top signals a speed control, typically adjusted for the seasons as temperature affected the length of the pendulum.

This last image is from a clock tower in Belgium. Clearly, the AI interpretation of "fantastical clock" is still somewhat limited.



Monday, March 04, 2024

The new artist on the block

 kw: ai art, generated art, artificial intelligence, simulated intelligence, comparisons, photo essays

A year and a half ago I began to use Dall-E2 as my "hired painter". A favorite pastime has been generating landscapes, particularly for use as Zoom backgrounds. This image is one I have used a lot:


The prompt for this was "A calming forest scene with wildflowers in a meadow, a stream, and a small pond, landscape painting." I don't recall how many times I ran the prompt, probably no more than twice, before I saw a 1024x1024 pixel square I liked, shown here. Then I outpainted (extended) it. The images Dall-E2 produces are PNG files, which are 5-7 times as large as a JPG saved with a 95% quality factor.

The result is 2624x1472 px, including the color bar used by Dall-E to identify its products. I cropped out a 2572x1447 portion, which is very close to the 16:9 aspect ratio needed for HD wallpaper. (As with any Blogger image, you can click on these to see them full size. The first image was reduced by about half from the original.)

Just a few days ago I got a notice from Bard, Google's version of ChatGPT, that its name was being changed to Gemini, and that it could now generate images. In the past few months I got access to a free version of Dall-E3 through Bing, and discovered another AI image program called Playground, that I've written about recently.

When I started to use Dall-E2 in 2022 I also tested the other two "legacy" AI art generators, MidJourney and Stable Diffusion. I found them more limited than DE2, and they are more expensive to use, so I ignored them since then. I recently took another look at MidJourney, but it runs as a Discord service, and I find Discord hard to use; it's also still too expensive. I'll mention more about Stable Diffusion in a moment.

I decided to test the products that I do use on the same prompt. Later I added another prompt, and we'll come to that.

First, I ran the "calming forest scene" prompt with Dall-E3 a few times, and picked the square shown here as the one most pleasing to me. DE3 doesn't yet do outpainting (at least not in the free version). Where the free version of Dall-E2 allows 15 free "Generate" steps per month, the free Bing version of Dall-E3 allows 15 per day. Running a prompt in Dall-E3 yields four square JPG images.

It is immediately clear that this image is more detailed, while retaining the look of a painting. Also, there is no color block or other "signature".

Both versions of Dall-E adhere pretty well to the prompt. Shorter prompts result in more variety. Prompt construction and editing become tools to negotiate with the product to get an image you want.

Secondly, I ran that prompt with Gemini. Since Gemini is also a chatbot, one must say, "Create a calming forest scene…", for example. You can ask Gemini for more suggestions, and get its help producing a prompt. Gemini also can produce four results per prompt, but sometimes it gives only two or three. At the moment, you cannot ask for human figures to be included; Google got in trouble when its early release of Gemini images yielded nearly all "minority" (non-Caucasian) faces.

The default size of Gemini images is 1536x1536, and they are JPG files. I reduced this one to 1024x1024 to compare with the other programs.

The level of detail is between that seen for Dall-E2 and Dall-E3. There is also a painterly look. I haven't tried asking for photographic detail.

Gemini claims that you can ask it to produce images of other sizes, between 256x256 to 1536x1536, and other ratios, such as 1024x576 (an HD ratio), but when I included size instructions in the prompt, I still always received 1536x1536 squares. Queried about this, Gemini said the capability was not yet there. I'd love to be able to produce 1920x1080 images from the get-go, but that has to wait.

Now, with Playground there are complications. Playground has a "side version" called Playground.AI that can produce images that are 1920x1080 and a wide variety of other sizes, but after doing about 20 prompts, a countdown reaches zero and you need to subscribe. I haven't seen whether more images become available after a month or whatever; I had other issues so I stopped using it. The "big version" at playgroundai.com has so many controls and options that it is hard to pick a single "default". For example, there are presently three Models, Stable Diffusion XL (they once included Stable Diffusion 1.5, but have dropped it), Playground v2, and Playground v2.5; there are also Samplers, which are mathematical methods used for the Diffusion operation, as many as 12 in the paid version and 8 in the free version; and there are dozens of Filters that affect the look of an image in ways ranging from subtle to dramatic; you can also turn on or off the random number generator used to produce a different Seed for each image. 

To simplify things for this experiment I let the Seed be random, I didn't use any Filters, and I used the following setups to produce five images, selected from groups of four under these conditions:

  1. SDXL (Stable Diffusion XL) with DPM2 Sampler
  2. SDXL with Euler a Sampler (Euler a is the default when you start to use Playground)
  3. PGv2 (Playground v2) with DPM2 Sampler
  4. PGv2 with Euler a Sampler
  5. PG25 (Playground v2.5), which doesn't use an explicit Sampler (of course there is a Sampler buried inside somewhere)

SDXL with DPM2. This and the following four images have quality and detail equal to DE3. This has a brighter look overall, including a little drama in the sky. I'll compare its siblings below with it.
SDXL with Euler a. Euler a in general has a softer look than DPM2.
PGv2 with DPM2. This is even brighter than SDXL, even a bit edgy, though the sky is more bland. I like the misty aspect of the background for all these, but this one is more pronounced.
PGv2 with Euler a. A little softer, as before, but this one also has a better sky.
PG25. This is even brighter than the others, almost too bright. All these five have a greater aesthetic quality than DE2 and Gemini, while matching DE3 in aesthetics though having a rather different feel.









They are all beautiful. But we are only half done here. All these art generators have the characteristic that they produce more widely diverse images when given very short prompts. Diverse not only from one product to the next, but from one image to the next in any product.

I happened to be reading a book about simulations in cosmology, so I decided to use a one word prompt: "Cosmology". This time I ran the prompt twice with each product. Here, for each product and variation I'll show all four responses to each issuance of the prompt. Dall-E2 is first:



These are screen shots of the image sets Dall-E2 produced. There's lots of variety from image to image, not just of subject matter but of style. Most of these have a focus, planets, galaxies, etc. The last image, at lower right, seems to have a wider scope, and most closely evokes "cosmology".

The next two sets of four are from Dall-E3, which displays its results in a block rather than a line, and with a black background:



Here, each set of four has a common theme, but the theme varies, as does the style, from one set to the next. Both sets have a galactic focus, but in the first set, two appear to host quasars.

Next is Gemini.



The first set is similar in concept to Dall-E2, with greater diversity. Its first image seems to best encapsulate "cosmology", having great scope. For the second set, Gemini produced only three images, all similar. One may click "Generate more", which I did, and it came up with two more images, one of which is shown here. The one not shown is different from all the others.

Now we turn to Playground, which produces groups up to four in a line. First, SDXL with DPM2 Sampler:



Six of these echo historical, pre-Enlightenment era, concepts of the Universe, in sundry ways. I'd say that the first image in the first set best illustrates the prompt. It seems to segue from Earth to infinity. Only the second image in the second set includes something vaguely like a galaxy.

Switching to the Euler a Sampler produced these:



Five of these bear some resemblance to the six "historical" images above, but the allover feel of these is different. None of these really fit the scope of the prompt.

Next we'll see PGv2 with the DPM2 Sampler:



These have less overall variety. The last image of the eight seems to go the farthest "out there". It's curious that PGv2 put people in nearly every image.

Now for the switch to the Euler a Sampler:



Wow! The first of the images would make a great bookplate. The rest are similar to the sets with the DPM2 Sampler, with the same tendency to include a person, if not persons. The third image in the second set is the closest of these eight to the "cosmology" concept, as I envision it.

Finally, let's see what PG25 does:



I still see a person or two, but all of these better approach the prompt in concept, with the eighth perhaps being the best. It is interesting that, with the one exception noted above, the images produced by Playground don't show things that look like galaxies. 

I see this collections of images as a catalog of "looks" I can refer to when choosing the way I want a generated image to appear. It is evident that Playground has the greatest variety of ways it can respond to a prompt, but that each of the other three art engines has something unique to offer.