Thursday, November 19, 2009

No gut, no nose

kw: work, musings

This is a pre-review riff; I've been reading Shop Class as Soulcraft by Matthew B. Crawford. He worked for a year, between getting his MA and his PhD, as an Indexer and Abstractor. It happens that half of my work is as an Indexer and Abstractor, but in a somewhat different setting than his:
  • He worked for a company that sold the abstracts and indexes. I work for an internal unit of a corporation, producing the metadata used for searching a large corpus of technical reports and related documents.
  • He worked with both business reports (his expertise) and technical/scientific documents (way, way outside his field of expertise). I work with strictly technical and scientific documents, which are right up my alley: My expertise is in all the hard sciences plus computer science. (Yes, all as in ALL: that's why I am a polymath)
  • He wrote a fresh abstract for every document, ignoring any abstract written by its author. I place great reliance on the author's abstract, figuring he or she has some idea of the substance of the document!
  • He had a quota of 28 documents per day. I work under a loose expectation of 100-200 documents per month, but can take a full day on one complex report if needed. For some, I can complete the work in "reading time plus ten minutes".

Such work is not unlike that of Scribes, who also had deadlines and quotas. One needs a certain kind of personality to work in a Scriptorium.

By the way, the monk in this image should be facing the window, so the light would be raking the page upon which he writes. A stylus was previously used to incise faint scratches in the page to guide the lines of text and outline any illumination. You can see the lines in raking light, but they are not visible by normal illumination (and by "normal" I mean light striking the page at nearly a right angle).

Dr. Crawford writes of the subversive hi-jinks some of his fellow "knowledge workers" engaged in, things like putting ridiculous text into an abstract. They were compensated only for productivity, with no measure of "quality" defined, and the pace of the work kept them bored out of their minds. In my work environment, we get constant, often blunt, feedback about quality, because our work is immediately exposed to the entire research community.

Crawford's stories add a dimension to an ongoing debate in my workplace. Some among us expect automated methods to soon replace human indexing and abstracting. Already, many of our clients are using full text searching to find the documents they need, making little use of the indexing terms that have been applied to a quarter-million documents' metadata over the past fifty years. But for documents older than ten years, only the abstract may be automatically searched, so they are still dependent on some amount of quality abstracting.

Will the day come that abstracting can be done automatically? I use one of the best "gist" software packages around for some of my work, named Copernic. The best it can do is gather selected sentences from a document. It cannot condense those sentences into the crisp format a good abstract demands. I used Copernic to condense The Prince by Machiavelli to 10% of its size. I could then read the book at a sitting (I was already familiar with its content). This Gist of the book is not too bad (damning with faint praise, here), but it does miss a lot. Sometimes it does well, and gets the sentence with "the point", leaving out the discursive examples, but sometimes it snatches a sentence from the middle of such an example, leaving a dangling thought.

In a recent discussion about automated "knowledge work", I told a colleague that, where we have aphorisms such as "Go with your gut" or "That has to pass the smell test", we don't expect computer software to have the emotional equipment such proverbs imply. A computer has no gut, no nose. Can computers ever gain either gut or nose?

I read last year that simulation software running on a modern 2-core microprocessor can now simulate, in real time, the activity of a single neuron, or with minor simplifications, a few interacting neurons (3 or 4). That means, positing appropriate software, that a supercomputer in the building where I work, which contains 16,384 microprocessors, could just about manage to model the behavior of a flatworm, which as a few thousand neurons.

The neuron evolved a billion years ago; flatworms have been with us 600 million years or more. The human brain has ten billion neurons, and about a quarter billion of these extend into the body; the brain also contains 100 billion glia, supporting cells that modify the behavior of the neurons. The total computing capacity of all the microprocessors on earth is not yet equal to one human brain.

But before there were neurons, there was a chemical signaling system, which is present in single celled critters also. This is the basis of emotions, including our "nose" and our "gut". It is faster and more specific than the cognitive reactions of the brain. To be human is to be both intelligent and emotional, and many of the things we do without thinking, such as turn a corner without hitting the wall while walking, are still very, very hard to do in the purely cognitive way that computers work.

We understand enough about cognition to do, by brute force, some remarkable things, such as build a computer that can beat a grand master at Chess. But it is like the "general" method for working a Rubik's Cube. My son and all his friends can unscramble a Cube in about one hundred moves. But some people can look at the cube and unscramble it in 15 or fewer moves. Seven moves are all it takes to scramble a cube so that either 15 or 100 moves is needed to restore it. Cognition has its limits. I suppose an appropriate computer method could grind for a moment, then display the seven moves needed to restore a Cube, but it might use many thousands of moves in its own memory to back-calculate what those seven moves were.

Back to Indexing. Copernic and a number of similar programs can gather key terms from a document using statistical methods. But the program has no sense of "aboutness". For example, in the document base I shepherd, there may be hundreds of documents containing terms such as "rust", "oxidation", "scaling" and "corrosion". Only for one in ten has the indexing term CORROSION been applied. Only for those was it deemed that CORROSION is an important concept, that the document is in some measure "about" corrosion (and some that use "rust" are about crop diseases, so indexed thusly). For the rest, it is an incidental concept. That is why we don't call ourselves just Indexers, but Conceptual Analysts.

I think it will be a while, perhaps decades, before any software method can perform decent Conceptual Analysis.

No comments: