Docker for Mac Named Volume Speed Penalty
September 21, 2016
Summary
Based on some test I did today; importing a 10MB gzip compressed MySQL
database dump, it seems that using a named volume is over twelve times
slower than using bind mount on Docker for Mac version 1.12.1. When a
named volume is used the data is stored into a virtual harddisk
Docker.qcow2
. Besides the significant speed penalty, this has to me also the
disadvantages of a higher risk of data loss; everything is in a single
file, and making it more of a hassle to restore individual files from
backup into the virtual harddisk.
Introduction
Yesterday I experimented a bit with a Docker MySQL container. I decided to check out a named volume first:
$ docker run --detach --name mysql-db --env MYSQL_ROOT_PASSWORD=S3CR3T \
--volume mysql-data:/var/lib/mysql mysql
Somehow I was expecting a mysql-data
directory to show up on my Mac
mini, still running "El Capitan". So, I used find
to hunt down this
directory in ~/Library
. Nothing found. Next, I inspected the
mysql-db
container:
$ docker inspect mysql-db
...
"Mounts": [
{
"Name": "mysql-data",
"Source": "/var/lib/docker/volumes/mysql-data/_data",
"Destination": "/var/lib/mysql",
"Driver": "local",
"Mode": "z",
"RW": true,
"Propagation": "rprivate"
}
],
...
But sudo ls /var/lib/
didn't show a docker
folder. Then I realised
that since "Docker for Mac" uses a virtual machine the named volume
must be created in a virtual drive. So I used screen
to connect to
the virtual machine as follows:
screen \
~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty
Note: if you don't see a prompt, press Enter.
I logged in as root
, which requires no password. Running ls -al
with
the path reported by docker inspect
did give the expected result:
moby:~# ls -al /var/lib/docker/volumes/mysql-data/_data/
total 188452
drwxr-xr-x 5 999 ping 4096 Sep 20 23:25 .
drwxr-xr-x 3 root root 4096 Sep 20 23:25 ..
-rw-r----- 1 999 ping 56 Sep 20 23:25 auto.cnf
-rw-r----- 1 999 ping 1329 Sep 20 23:25 ib_buffer_pool
-rw-r----- 1 999 ping 50331648 Sep 20 23:25 ib_logfile0
-rw-r----- 1 999 ping 50331648 Sep 20 23:25 ib_logfile1
-rw-r----- 1 999 ping 79691776 Sep 20 23:25 ibdata1
-rw-r----- 1 999 ping 12582912 Sep 20 23:25 ibtmp1
drwxr-x--- 2 999 ping 4096 Sep 20 23:25 mysql
drwxr-x--- 2 999 ping 4096 Sep 20 23:25 performance_schema
drwxr-x--- 2 999 ping 12288 Sep 20 23:25 sys
Note: you can quit the screen
session press Ctrl+A followed by
Ctrl+\ and answer "Really quit and kill all your windows" with "y",
see the last part of
Getting Started with Docker for Mac.
The virtual harddisk is /dev/vda2
and the actual data is stored in
the file Docker.qcow2
on the host computer, in my case a Mac mini:
$ cd ~
$ ls -hl Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/
total 2251328
-rw-r--r-- 1 john staff 1.1G Sep 20 18:13 Docker.qcow2
:
:
I don't consider it a safe idea, to store database related files all
together into a virtual disk that will be backed up as a single large
file. So I decided to check out bind mount; mounting a directory on OS
X onto a path in the MySQL container. This way I would have access to
all the database related files directly on OS X. This would make
recovery from back up much easier in case of a mishap. But which
method was faster? As I wanted to import a very large database
dump, 1.2G compressed using gzip -9
, speed was very important to
me. Since I already had created a container with a named volume using:
$ docker run --detach --name mysql-db --env MYSQL_ROOT_PASSWORD=mypassword \
--volume mysql-data:/var/lib/mysql mysql
I decided to test first the speed of storing data into a virtual harddisk.
After a few attempts, which gave me the impression that this method would be
slow, I decided to run the import using time
to measure how long it would
take.
$ time (gzip -dc big-data-dump.sql.gz |\
docker exec --interactive mysql-db \
bash -c 'mysql -uroot -p$MYSQL_ROOT_PASSWORD')
mysql: [Warning] Using a password on the command line interface can be insecure.
Note that I use bash
so I can use the environment variable
$MYSQL_ROOT_PASSWORD
, which is set to the actual password inside the
container, instead of exposing the actual password. If there is an
easier way to do this, let me know.
This morning, the process was still running. A quick check using du -hs
showed that about 24% was done. I canceled the process, this was
taking way too long; it had been running for close to 14 hours. As I
had imported a smaller version, but still large, version of this
database before in a Parallels virtual machine so I knew that this was
excessive. So I decided to first do some tests with a much smaller
database dump.
Edit: there seems to be an issue with the large database dump as I have problems importing it using bind-mount as well.
Named Volume versus Bind-mount
I run a local version of
MediaWiki to keep notes
and links. Its database dump compressed using gzip -9
is just 10M. I
ran four tests using a named volume, the results are given
below. Because I was not sure if VirtualBox might have something to do
with the slow down I quit this program before I ran the last test. But
no significant difference with the previous three runs.
Before each test I quit Docker and deleted Docker.qcow2
using:
$ rm Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/\
Docker.qcow2
Then I restarted Docker and created the container and did the import as follows:
$ docker run --detach --name mysql-db --env MYSQL_ROOT_PASSWORD=S3CR3T \
--volume mysql-data:/var/lib/mysql mysql
$ time (gzip -dc wikidb-20160921-100357.sql.gz | docker exec --interactive \
mysql-db bash -c 'mysql -uroot -p$MYSQL_ROOT_PASSWORD')
Results, formatted horizontally:
+-----------+----------+----------+
| real | user | sys |
+-----------+----------+----------+
| 4m23.581s | 0m0.239s | 0m0.132s |
| 4m15.187s | 0m0.242s | 0m0.129s |
| 4m03.262s | 0m0.243s | 0m0.129s |
| 4m12.767s | 0m0.240s | 0m0.126s |
+-----------+----------+----------+
The average "real" over 4 runs is 4m13.634s.
Next I tested using a bind-mount. Again, before each test I quit
Docker and deleted Docker.qcow2
using:
$ rm Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/\
Docker.qcow2
Then I restarted Docker and created the container and did the import as follows:
$ mkdir -p ~/Docker/volumes/mysql-data/
$ docker run --detach --name mysql-db --env MYSQL_ROOT_PASSWORD=S3CR3T \
--volume /Users/john/Docker/volumes/mysql-data/:/var/lib/mysql mysql
$ time (gzip -dc wikidb-20160921-100357.sql.gz | docker exec --interactive \
mysql-db bash -c 'mysql -uroot -p$MYSQL_ROOT_PASSWORD')
$ rm -rf ~/Docker/volumes/mysql-data/
Results, formatted horizontally:
+-----------+----------+----------+
| real | user | sys |
+-----------+----------+----------+
| 0m21.033s | 0m0.242s | 0m0.133s |
| 0m21.281s | 0m0.236s | 0m0.132s |
| 0m19.348s | 0m0.235s | 0m0.128s |
| 0m20.143s | 0m0.238s | 0m0.128s |
+-----------+----------+----------+
The average "real" over 4 runs is 20.451s. Or using bind-mount instead of a named volume is 12.4 times faster.
Out of curiousity I also ran four tests inside a Ubuntu 15.10 installation running under VirtualBox, repeating the following two lines:
$ mysql -uroot --password=S3CR3T -e 'DROP DATABASE wikidb'
$ time (gzip -dc wikidb-20160921-100357.sql.gz | mysql -uroot --password=S3CR3T)
+-----------+----------+----------+
| real | user | sys |
+-----------+----------+----------+
| 0m48.117s | 0m1.196s | 0m0.220s |
| 0m40.243s | 0m1.228s | 0m0.160s |
| 0m39.176s | 0m1.208s | 0m0.136s |
| 0m42.999s | 0m1.172s | 0m0.328s |
+-----------+----------+----------+
As the configuration of MySQL might be different I can't conclude that VirtualBox is about twice as slow compared to Docker for Mac using bind-mount. Moreover, I had migrated this virtual machine from Parallels which might have resulted in a less than optimal virtual disk layout.
Conclusion
The current version of Docker for Mac; Version 1.12.1 (build: 12133), is over 12 times slower when using a named mount compared to bind-mount when importing a MySQL database dump.
Discussion
It's not clear to me why a virtual disk is being used for storing data. Not only named and unnamed mounts use this disk, but it's also used for storing images, etc. As this virtual disk has a limit of 60G, which can be much lower than the space available on the host operating system, this can lead to confusion of users; they might get an "out of space" error message while they have more than enough space left on the hosts' hard drive. As all data is stored in a monolithic file there is quite a risk of accidental data loss. And of course, this makes recovery of specific files from backup cumbersome.
Since it is already possible to access directories and files on the
host operating system it's unclear to me why Docker choose this
method. I would have preferred to have images somewhere stored under
~/Library
. And maybe the same for named and unnamed volumes.