Last Monday, I suspected that Googlebot didn't like the B word in a URL. A close examination of my access_log showed that Googlebot fetched the index, but didn't do anything with the other pages linked from that page. So I decided to rename my blog to MexIT, a combination of Mexico and Information Technology.
[21/Apr/2004:23:45:01 -0700] "GET /blog/ HTTP/1.0" [23/Apr/2004:02:03:06 -0700] "GET /blog/ HTTP/1.0" [24/Apr/2004:00:58:56 -0700] "GET /blog/ HTTP/1.0" [25/Apr/2004:00:13:43 -0700] "GET /blog/ HTTP/1.0" [26/Apr/2004:00:31:08 -0700] "GET /blog/ HTTP/1.0"
And indeed, yesterday the new index was fetched by Googlebot, and today the pages linked on the MexIT index page.
[27/Apr/2004:00:10:00 -0700] "GET /mexit/ HTTP/1.0" [27/Apr/2004:23:27:34 -0700] "GET /mexit/ HTTP/1.0" [28/Apr/2004:03:50:38 -0700] "GET /mexit/2004/04/16/ HTTP/1.0" [28/Apr/2004:04:24:51 -0700] "GET /mexit/2004/04/18/ HTTP/1.0" [28/Apr/2004:04:38:13 -0700] "GET /mexit/2004/04/11/ HTTP/1.0" [28/Apr/2004:05:15:59 -0700] "GET /mexit/2004/04/12/ HTTP/1.0" [28/Apr/2004:05:23:08 -0700] "GET /mexit/2004/04/10/ HTTP/1.0" [28/Apr/2004:05:28:01 -0700] "GET /mexit/2004/04/17/ HTTP/1.0" [28/Apr/2004:05:34:32 -0700] "GET /mexit/2004/04/13/ HTTP/1.0" [28/Apr/2004:05:42:56 -0700] "GET /mexit/2004/04/14/ HTTP/1.0" [28/Apr/2004:05:55:39 -0700] "GET /mexit/2004/04/15/ HTTP/1.0" [28/Apr/2004:05:56:25 -0700] "GET /mexit/ HTTP/1.0" [28/Apr/2004:06:06:42 -0700] "GET /mexit/2004/04/19/ HTTP/1.0"
Note that I suspect Googlebot not crawling URLs with the B word (maybe combined with the date in the URL), or just crawling them slower than ordinary pages. I have no real proof other than the above access_log snippets.
When I suspected this the first time, I used Google to find some information about spidering blogs, and read old news about Google having plans of removing blogs from their Internet search results.