First run done
Right, we have some ecosystem results here. 639 blogs, although a lot of the UserLand-hosted ones (editthispage.com etc) won't have been counted because my crawler was blocked after going too fast in the first run. The next thing I'll do is get it to group pages by IP address and only fetch, say, one page every minute from a single IP, which should let me get the UserLand ones OK. I wonder what the 'nasty crawler' threshold is on the server.
(Tech note: I cache pages, so I'll only ever fetch any given page once. I shouldn't stress any single site at all, but servers which host thousands of blogs will notice quite a few hits. That will change.)
... more like this: [Blogging Ecosystem]
(Tech note: I cache pages, so I'll only ever fetch any given page once. I shouldn't stress any single site at all, but servers which host thousands of blogs will notice quite a few hits. That will change.)