kw: blogs, blogging, spider scanning, inferences
I just opened this blog to write a new post and I first checked the statistics. Each day for the past week or so there has been a burst of activity between 8am and 10am local time (EDT), amounting to 15,000 to 20,000 views. I saw that today there had been a sudden burst of extra activity starting about 9am, that was still going on. Here is the "Now" view of the stats for the past two hours (~9:30-11:30):
I just barely caught this in time to see the tail end of the pattern. Prior to 11am, the hit rate was around 2.5 per second or ~9,000 per hour (150-ish per minute), according to the "Past 24 hours" view. Here we see that after 11 it dropped to about half that for twenty minutes, then went to the more ordinary "background" rate of just one view or fewer per minute. The total number of hits in this period is about 15,400.I infer that when a spider—whether of the spying type or the AI training type—encounters this blog, it usually snarfs up what it can from the home page and then goes elsewhere. Sometimes or in some cases a spider may go deeper, because 700 page views in two hours is still at least ten times the "honest" page view rate, the rate of humans actually going to a page to read it.
Considering that the oldest posts in this blog are 21 years old, it makes sense for an AI training spider to concentrate on newer material.

No comments:
Post a Comment