The Perl program below processes an access log file as generated by the Apache HTTP server and creates output which can be imported in a spread sheet program, for example Calc, which comes with OpenOffice.org, or Microsoft Excel.
Read the rest of Googlebot statistics.
Hi Roy,
I only know awstats.pl from the "hack" attempts I see now and then on my site, so I have no idea if it's comparable. My script does a very fine break-down for Googlebot only, since that is what I am interested in the most.
I might add a thing or two to this script, but don't expect too much (it will stay a small tool).
It looks like it doesn't work b/c the regex isn't matching googlebot.com. In my access log, it looks like this:
66.249.72.36 - - [15/Sep/2006:03:21:21 -0400] "GET
/shells/printer.phtml?id=5112 HTTP/1.1" 200 5084 "-" "Mozilla/5.0
(compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Hi Jason, it doesn't work, because the Perl program assumes that host names are in the first column, not ip addresses. I will look into a fix for this.
I was able to get it to work by tweaking the regexp:
$line =~ m!
^.+?
\[(\d\d)/(\w{3})/(\d{4})(?::\d\d){3}.+?\]
\s"GET\s(.+?)\sHTTP/\d.\d"
\s(\d{3})
.+
Googlebot/
That should work for domain names, too.
Note that your comment doesn't show up immediately. I review each comment before I add it to this site.
Check the Follow this page option if you want to receive an email each time a comment is posted to this page, including yours. A link to turn this option off will be included with each email.
Internet adresses will be converted automatically. You can use the following notation to specify anchor text for a link: [url=http://example.com/]example text[/url].
In sourceforge you will find awstats.pl, which does similar things and has a good per crawler break-down... that said, only consider awstats.pl if you want to re-use a few bits and extend the functionality, of course...
All projects begin small.