Need a web scraping or data wrangling solution? Check out my resume (PDF).
John Bokma's Hacking & Hiking

Building GoAccess from source on Ubuntu

May 30, 2019

Yesterday I built GoAccess, an open source web log analyzer, from source on Ubuntu 16.04.1 because the version available via the repository was severely outdated. Today I decided to write down my notes as a blog and generate a few reports to illustrate this useful program.

Removing an old version of GoAccess

As I had already installed GoAccess via apt I removed this old version first:

sudo apt-get purge goaccess

Downloading, building, and installing GoAccess

Change to your home directory and use wget as follows to download the stable version 1.3 of GoAccess:

cd ~
wget https://tar.goaccess.io/goaccess-1.3.tar.gz

Install the required development libraries:

sudo apt install -y libgeoip-dev
sudo apt install -y libncursesw5-dev

Install the tools to compile GoAccess:

sudo apt install -y make gcc

Unpack the tarball and change into the newly created directory:

tar -xzvf goaccess-1.3.tar.gz
cd goaccess-1.3/

Next, configure the build process:

./configure --enable-utf8 --enable-geoip=legacy

I used the following options to configure:

See Installation at the GoAccess site for more information on the other options available.

Next, run make to compile a version of GoAccess in the current working directory:

make

If all went well install goaccess as follows:

sudo make install

Important: if goaccess was installed via apt earlier on and has been run at least once bash remembers the path to this version using a hash table. Because make install installs the binary in a different path, bash has to forget this path; reset the hash table as follows:

hash -r

Clean up by leaving the working directory and deleting this directory and the downloaded tarball:

cd ..
rm -rf goaccess-1.3
rm goaccess-1.3.tar.gz

Examples of GoAccess output

I downloaded the access log of my tumblelog Plurrrr and created a HTML report as follows:

goaccess access.log -o report.html --log-format=COMBINED

I also ran goaccess in a terminal with a black background:

goaccess access.log -c

The screenshots below show samples of each report.

Unique vistors per day - including spiders - HTML
Unique vistors per day - including spiders (HTML).
Unique vistors per day - including spiders - HTML
Unique vistors per day - including spiders (Terminal).
Static requests - HTML
Static requests (HTML).
Static requests - terminal
Static requests (Terminal).

From the above it's clear that my tumblelog suddenly got a lot of trafic the 25th of May when I wrote a comment on Hacker News about my blog and how it is generated statically.

You can also clearly see that the front page has several images which are all loaded at the same time (3rd image).

Related