Generating an HTML report with GoAccess
August 28, 2016
This month I started blogging again after a hiatus of over 4 years. Curious if my site was already getting more traffic, and to which pages, I wanted to make a report of my sites' access log.
I had already some experience with AWStats a free log file analyzer, but was looking for something simpler. After some Google research I came upon GoAccess - Visual Web Log Analyzer, which I installed on my virtual Ubuntu 15.10 as follows:
sudo apt-get install goaccess
I had already skimmed the HTML Reports section of How To Install and Use GoAccess Web Log Analyzer with Apache on Debian 7 | DigitalOcean so I thought the following would generate the desired HTML report:
goaccess -f johnbokma-com-access.log -a >report.html
But this reported the following fatal error:
GoAccess - version 0.8.3 - Oct 24 2014 17:27:51
Fatal error has occurred
Error occured at: parser.c - parse_log - 1053
No date format was found on your conf file.
After some struggling, including find
to find the default
configuration; no luck, downloading the latest version of the GoAccess
source and copying its configuration -- which failed because it
supports more options -- I created a goaccess.conf
file with the
following options:
date-format %d/%b/%Y
log-format %h %^[%d:%t %^] "%r" %s %b "%R" "%u"
Running the following command resulted in the desired HTML report:
goaccess -p goaccess.conf -f johnbokma-com-access.log -a > report.html
Upon viewing the HTML report I noticed two issues. First, two IPs had been hitting my site very often and after some research I decided to use iptables to block the whole range both belong to.
The second issue was that the program I wrote yesterday to generate the RSS feed for this site used incorrect links, access to which showed up in the "HTTP 404 Not Found URLs" pane of the HTML report. A small modification to the Perl program I use to generate this blog fixed the broken links in the RSS feed.