Don't use the B word

Wednesday, April 28, 2004 | 0 comments

Last Monday, I suspected that Googlebot didn't like the B word in a URL. A close examination of my access_log showed that Googlebot fetched the index, but didn't do anything with the other pages linked from that page. So I decided to rename my blog to MexIT, a combination of Mexico and Information Technology.

[21/Apr/2004:23:45:01 -0700] "GET /blog/ HTTP/1.0"
[23/Apr/2004:02:03:06 -0700] "GET /blog/ HTTP/1.0"
[24/Apr/2004:00:58:56 -0700] "GET /blog/ HTTP/1.0"
[25/Apr/2004:00:13:43 -0700] "GET /blog/ HTTP/1.0"
[26/Apr/2004:00:31:08 -0700] "GET /blog/ HTTP/1.0"

And indeed, yesterday the new index was fetched by Googlebot, and today the pages linked on the MexIT index page.

[27/Apr/2004:00:10:00 -0700] "GET /mexit/ HTTP/1.0"
[27/Apr/2004:23:27:34 -0700] "GET /mexit/ HTTP/1.0"
[28/Apr/2004:03:50:38 -0700] "GET /mexit/2004/04/16/ HTTP/1.0"
[28/Apr/2004:04:24:51 -0700] "GET /mexit/2004/04/18/ HTTP/1.0"
[28/Apr/2004:04:38:13 -0700] "GET /mexit/2004/04/11/ HTTP/1.0"
[28/Apr/2004:05:15:59 -0700] "GET /mexit/2004/04/12/ HTTP/1.0"
[28/Apr/2004:05:23:08 -0700] "GET /mexit/2004/04/10/ HTTP/1.0"
[28/Apr/2004:05:28:01 -0700] "GET /mexit/2004/04/17/ HTTP/1.0"
[28/Apr/2004:05:34:32 -0700] "GET /mexit/2004/04/13/ HTTP/1.0"
[28/Apr/2004:05:42:56 -0700] "GET /mexit/2004/04/14/ HTTP/1.0"
[28/Apr/2004:05:55:39 -0700] "GET /mexit/2004/04/15/ HTTP/1.0"
[28/Apr/2004:05:56:25 -0700] "GET /mexit/ HTTP/1.0"
[28/Apr/2004:06:06:42 -0700] "GET /mexit/2004/04/19/ HTTP/1.0"

Note that I suspect Googlebot not crawling URLs with the B word (maybe combined with the date in the URL), or just crawling them slower than ordinary pages. I have no real proof other than the above access_log snippets.

When I suspected this the first time, I used Google to find some information about spidering blogs, and read old news about Google having plans of removing blogs from their Internet search results.

Blogs and Google related

Also today

Googlebot B aware (overview)

Please post a comment | read 0 comments | RSS feed