Friday, June 10, 2011

More mileage per milestone

kw: business, work

I've just finished a week-long slog through a file of more than 600 documents that needed to be quickly indexed. It was one of those projects than a former supervisor was talking about when she said, "You don't have to like it, you just have to do it."

I am blessed with a job that is usually enjoyable. That generally includes the indexing. There is always something interesting to read along the way, for the reports I index are usually a result of hot research projects. From time to time, however, there isn't the luxury of time for a report-by-report, detailed analysis. When we get a whole slug of related reports, such as from a technical conference, or when a collection is to be archived, bulk techniques are needed. Clicking along at ten or twenty reports per day won't do.

This week was such a case: the collected proceedings of a technical conference. The authors had been required to add a list of suggested key terms to their abstracts. A few of them are familiar with our controlled vocabulary for indexing, but most just put whatever was meaningful to them. It was my job to turn these into "official" terms, plus pick a term or two from the title or abstract that the author may have passed over, but that we IS folks prefer to capture.

One helpful technique was to sort the suggested keyterms into unique strings. That reduced 2,880 items to 1,850. A check using various stemming techniques against the official concept list netted 400 hits, so I only had to run through 1,450 "by hand". Four days. Then a day spent running through the abstracts for "missed" terms, sorting everything back in, and I am done. This'll get my numbers up for the month! I'm going home for a nap.

No comments:

Post a Comment