kw: space aliens, DNA, central dogma of genetics, genetic code

Do you think Spock could really exist? Is a hybrid between humans (or any Earth species) and an interstellar alien possible? Alien-human hybrids figure prominently in some writers' stories...and they are the core fear driving the "alien abduction" folks. Could it happen???

The Central Dogma of genetics is that DNA is transcribed onto RNA, and the transcribed RNA is used to make Proteins. Most of these proteins are Enzymes, that is, peptide catalysts. That isn't all there is to it, because some RNAs are Rybozymes, behaving as Enzymes though they aren't proteins. It turns out that biochemistry is nearly all geometry, and you can form a desired shape from amino acids (proteins) or from RNA. Also, in trying to determine if "junk DNA" is really junk, we are finding that some DNA has a regulatory function within the nucleus, plus there are at least five or six levels of regulatory activity by various protein families.

So, DNA is Transcribed to RNA, and most RNA is Translated to Protein. The translation function of genetics is based on the Genetic Code, which determines how the 64 3-Base Codons are distributed among the 20 Amino Acids and the Start and Stop functions. As it happens, in most Earth creatures, some amino acids correspond to as many as six RNA codons, some to four, on to just three that correspond to a single codon each. There is a minor difference between the code for Prokaryotes and Eukaryotes, but 62 of the codons are used identically in all cells...on Earth.

It appears that many of the amino acids used in active sites have larger codon sets, while "backbone" AAs have smaller sets. AAs that are "important" have more codes than those that are have more "scaffolding" functions. This is not totally consistent, but it explains one source of robustness; many point mutations make no difference in the shape of the protein encoded.

Recently, we find that all parts of this system are malleable. It is possible to synthesize bases that could be used in place of any of the A, C, G, and T (or U in RNA) bases used in Earth DNA. It is also possible to synthesize a great variety of "extra" amino acids. In fact, there are a couple of "extra" AAs that seem to be used by certain bacteria, and some researchers claim that there are a dozen or so variants of the "Earth" genetic code, found among bacteria only.

Researchers have also created the pieces to get certain bacteria to synthesize and use, in proteins, an amino acid not normally found in nature. So, while nearly all Earth life uses the familiar four bases and twenty amino acids, there is no guarantee that life forms that arose on a distant planet do so.

Since we know that all 64 Codons are used, we can think of a few simple variations that would work equally well. If you have a string of three symbols, there are six ways you can arrange them. That means, if you were to perform the same rearrangement on every codon in the set, there are six "genetic codes" that work exactly as the Earth one does.

If instead you determine how many different ways 64 Codons can be distributed among 20 AAs, the number isn't just in the billions, or in the trillions: it is an immense number with more than seventy digits. If we just look at a near-even distribution, with 60 codons parceled out three at a time to 20 items (e.g., the 20 AAs), with the two remaining items getting two codons each, the value is **60!/[(2!)**^{2}(3!)^{20}], which comes to 8.7x10^{72}. There are many, many other ways to partition how many codons each AA gets.

Many of these will be less favorable, because certain 'sensitive' AAs don't get sufficient protection from mutation, but there are huge numbers of relatively 'good' codes that could be used, more than enough that each planet in the visible universe (assuming a billion or so per galaxy, and perhaps a trillion galaxies) can choose among billions of possibilities.

As a result, even if every star has a planet bearing life, the likelihood that two of them will have the same genetic code, or even remotely similar genetic codes, is effectively zero. We can determine how close to zero, by means of the "birthday paradox". You may know that, if you get 23 or more people in a room, there is a better-than-even chance that two of them will have the same birthday.

Similarly, if you assume that blondes have at most 100,000 hairs on their head, in any city with more than 100,000 blondes, there will be quite a few pairs with exactly the same number of hairs on their head. What is interesting is, if you have any group of at least 373 blondes, there is a better-than-50% chance that two of them will have the same number of hairs. Which to? You gotta count a lotta follicles!

Now, if the universe of possible DNA codes is of the order of 10^{72}, what are the chances that there is an exact match between two codes, if the number of inhabited planets in our galaxy is ten billion...or in the universe, perhaps a quintillion (billion trillion)? Both are wild guesses, but plausible.

In the first case, the probability of a single match is 1-exp(-10^{-52}), which has 52 zeroes before you get a nonzero digit. In the second case, the match probability is 1-exp(-10^{-30}), which has 30 zeroes ahead of the first nonzero digit. Either way, it's way, way smaller than a chance in a million. Turning the question around, how many "viable codes" might there be, to assure that in the universe, or in our galaxy, the chance of at least one match is at least 50-50? For the galaxy, there must be fewer than 10^{21} actual DNA codes in use, and for the universe, fewer than 10^{43}.

So, we start with a 72-digit number, and cut it down by 29 digits, just to make some chance that two life forms, that arose in different planets, could hybridize. Not a good bet.

Added note January 2007: There are currently seventeen "genetic codes" known, in use by Earth organisms (see the Wikipedia article Translation (genetics) ). The vast majority of cells use the "standard code", but there are many minor variants used by mitochondria, a couple used by plastids, and a few others used by certain eukaryotic microbes.