Polymath at Large: The man in the box is missing

kw: book reviews, nonfiction, artificial intelligence, algorithms, surveillance

Google is spying on you. So are Facebook, Twitter, and nearly every web site that you use. Never mind the NSA and the CIA and their ilk (another 15 of them, just in the US!). Here's the rub, though. Amongst the many "likes" and other clicks you make daily; and if you like to write, as I do, amongst hundreds to thousands of words you generate; and amongst the photos and links and other "stuff" you might send out into the ether, daily: Is there someone who can digest it all, winnow it, and produce a comprehensive picture of just who you are?

Not a someone, but a something. Namely, an algorithm; strictly speaking, in each such case (Google, etc.) a veritable forest of algorithms that analyze (gather, group, parse, summarize, mathematically rotate in n-space) all the data we generate for them and give to them.

What is an algorithm? It is a recipe. A cookbook is filled with descriptions of algorithms that a cook will follow to produce this or that dish. A computer program contains, in specialized language, recipes that the computer will follow to "cook" data to produce "dishes".

Early electronic computers were used to produce tables for predicting the flight of a mortar shell, which made it easier to aim accurately. The ballistics table was the "dish". These days, the "dish" is more likely to be something that an advertiser will use to influence you to buy something, or a political party will use to influence how you vote, or a "nonprofit" group will use to gain your support. Does any of that worry you?

David Sumpter, a professor and applied mathematician, worried about these things, and he had the tools and the standing to do some digging around. How much of this is true, and how effective are these techniques? His intellectual and mathematical journey and the conclusions he drew are described in Outnumbered: From Facebook and Google to fake news and filter-bubbles — the Algorithms that control our lives.

He describes what he found, and, to cut to the chase, while the power of the algorithms is amazing, the results are slender. The algorithms are not much danger. Even the more powerful algorithms we can expect in the next generation or two are unlikely to be much danger, in and of themselves. The true danger is that powerful people believe their results, even when it can be shown that the best algorithm is no more accurate than the informed opinion of an intelligent expert.

Why is that so? I must back up here: we in the computer field rather loosely talk about "putting the intelligence" into a program. But that does not make the program "artificially intelligent." In every case, we embody some aspect of human intelligence in an algorithm that a computer can perform very fast. For example, I know how to perform a Fourier Series analysis, to analyze the frequencies in some kind of signal, such as a snippet of a song or a portion of a digital photo. It takes a lot of calculations (millions of them to analyze 1/100th of a second of audio). Computers can do those calculations so fast that audio spectrum analyzers that run on a smart phone can produce a sound spectrum in real time (I use one to "see" bird songs). No computer "invented" the Fourier Series analysis, a human did. So, why is it that a knowledgeable person can still outperform a mechanical "understanding" of social cues? Two things: (1) The human mind works in ways that nobody yet has a clue about; and (2) We instantly recognize similarities, which computers don't do well, while computers instantly find tiny differences, which we have a hard time at. (Just by-the-bye, I made a 40-year career exploiting the synergy of mind and machine.)

So the problem with letting algorithms "do stuff" is that they don't do it any better than we do. This isn't likely to change much in the next few decades. Further, because the data used to "train" the "deep learning" systems in use today (such as for Google Translate) contain lots of human bias, those biases will be reflected in the results produced by the systems. For example, suppose we use the trillions of sentences in Twitter to train a natural-language understanding-and-response system. Is it possible to pre-filter out the trolls, the "hate speech", the bigotry and sexism and this-and-that-phobic utterances? If we don't, the trained system will exhibit the biases and bigotry of the data that went into it. This is a malicious example of "garbage in garbage out."

For most of us, we experience the results of all that spying and calculation in the ads we see online, and even the kinds of results we get from doing a search. I have learned to search for anything meaningful in an Incognito (or inPrivate) window. My wife and I had this experience: a handle on an end table was getting loose. We found that the threads on its screws were getting stripped, probably from being bumped rather hard at some point. We looked online for handle hardware, from various sources. We found something we liked, and ordered handles enough to fix both end tables. Guess what? I got lots of ads for handles, in Google, Facebook, and nearly everywhere else that had banner ads! I got them for a month. I thought of writing to Google, "Dudes, I bought a set of handles the same day. Y'all are way, way too late!" But what's the use? Until they think to check online purchase information, they can't know it.

I did three little experiments. First, in a Firefox Private window, I went to www.google.com and began typing "how to peel", and then I recorded the auto-complete results:

a mango
butternut squash
garlic
pearl onions
a kiwi
an orange
ginger eggs
a banana
tomatoes

When I did the same thing, logged into my Gmail account (on another tab), the first result was

a pineapple

followed by the next 8 items above, from "mango" to "banana". Yesterday, I happened to look up YouTube videos about peeling pineapples. No surprise there.

Then I did the same process, starting with "best books for". The auto-complete list was:

young adults
teens
3 year olds
men
toddlers
2 year olds
4 year olds
men 2018
babies

When I did the same thing "as myself", I got exactly the same list. Did you notice, as I did, the glaring omission of women or girls? That is funny, because my wife and I use the same account credentials and the same email accounts. Maybe females don't search online for books (but I'd be surprised...)

Then again, in private, I entered the search term "genealogical". The results list, first page, was:

the definition
genealogical.com
www.dictionary.com/browse/genealogical
www.thefreedictionary.com/genealogical
delgensoc.org
[2 "top stories"]

"How genealogical sleuthing led to suspect in Warwicks joggers death" - The Providence Journal
"Genealogical Society learns about creating digital family histories" - Community Journal

www.archive.gov/research/alic/reference/genealogy.html
www.archive.gov/research/genealogy
www.ngsgenealogy.org
- not detailed
www.newyorkfamilyhistory.org

The page also included a photo-ad from ancestry.com for its software and service. After that, logged in again as myself, I entered "genealogical", with these results:

[3 ads]

www.myheritage.com/Genealogy
www.geneticsdigest.com
www.genealogy.com

the definition
genealogical.com
en.wikipedia.org/wiki/Genealogy

and then the rest as above, from www.dictionary.com through the rest except the newyorkfamilyhistory one. The photo-ad for ancestry.com was the same. Are these differences significant? Not usually.

The book ends on a slightly melancholy note. Dr. Sumpter found he'd become something of a wet blanket at parties, because when people started grumbling about what Google and FB and others were doing, he had the real info, which was less flashy. Sure, if an algorithm can sell 0.1% more fribble widgets, and if the market for fribble widgets is usually a few million, that 0.1% could be a few thousand more of them being sold. But people want something to be a little scared about. There's nothing scary about fractions of a percent. Even filter-bubbles turn out to be a toothless bugaboo. Nothing pops a bubble quicker than outside info from a source that isn't trying to change your mind, but is just presenting the facts as seen from elsewhere. All of us have more than one source of information. Sooner or later, any bubble we've been in will come up against "real life" from another angle.

I have to tell this story, a bittersweet one. In 1980 the first Chinese students were allowed to come to American colleges. A few years later some Chinese nationals who had relatives in the US were allowed to visit them. A Chinese friend of ours, a man in his seventies, was visited by his sister. For a few days, she said a lot about how good the Chinese system was, how Communism was better than Democracy, and so forth. I understand, of course, that in China, speaking any other way could get a person in trouble. Our friend and his wife didn't argue with her. They just took her along the next time they went to the grocery store, at a Walmart. The dear lady saw the rows and rows of fresh produce, the bread, and everything else, and began to cry. I must have been rather hard for her to return to China a few weeks later.

Will artificial intelligence make the algorithms work any better? Many of them are already being called AI, but it is hyped. It is just the rapid application of human intelligence. Anyway, we may be in a waning cycle of hype about artificial intelligence. Computer software is in no danger of replacing us any time soon. Before there were electronic computers, people had cleverly devised mechanical aids of many kinds to speed up the processes we needed. Napier's Bones were devised in the 1600's to simplify multiplication and division. Slide rules (I still have a few) preceded the pocket calculator by a few generations, and could also perform trigonometric and logarithmic calculations. I have an old copy of Machinery's Handbook, which contains "log tables", in case you need greater accuracy than a slide rule can produce. The first mainframe computer I used was no more accurate than the table of logarithms, it was just faster.

So far, all these things are tools. The algorithms used on our social data are tools used by marketers and propagandists. If we keep ourselves aware of these facts, we'll have better sales resistance, and "fake news" resistance. And we'll also be able to turn the tools back on themselves to our own benefit: sometimes the ads they show us are for something that we really can use, and we didn't even need to go looking for it!

Polymath at Large

Friday, February 08, 2019

The man in the box is missing

1 comment: