A job queue in MySQL

Thursday, March 15, 2012 | 0 comments

Implementing a job queue using a table in MySQL might be not the best possible way, see Five subtle ways you're using MySQL as a queue, and why it'll bite you.

At the beginning of this year I had a problem: from a Perl program I had to select the oldest row in a MySQL database table to obtain some data and set the time of the row to the current time to prevent other processes to use the same row data immediately. In short: I wanted to update a row in a transaction and being able to read some data from it at the same time. The first solution that came to mind was using an UPDATE which would also set user-defined variables and then after the commit I could use SELECT to fetch the values assigned to those user-defined connection specific variables. While this is possible, see below, back then I couldn't find out the correct syntax to do this in the MySQL Reference Manual.

But while reading the Reference Manual I learned about SELECT ... FOR UPDATE and thought that I had found the solution:

START TRANSACTION
SELECT ... FROM table ORDER BY last_used ASC LIMIT 1 FOR UPDATE
UPDATE table SET last_used = UNIX_TIMESTAMP() WHERE ...
COMMIT

While this does work for a small number of parallel processes increasing the number of processes might soon run into one process holding a lock too long, causing the following error:

Lock wait timeout exceeded; try restarting transaction.

When I ran into this issue I did some more research and came upon the aforementioned article Five subtle ways you're using MySQL as a queue, and why it'll bite you by Baron Schwartz. It mentioned, amongst other, the above issue with SELECT ... FOR UPDATE and advices to use UPDATE with a LIMIT instead if you really want to use MySQL to implement a job queue, and to make sure that autocommit is enabled. The latter greatly reduces the time the lock is being held by a process.

The article itself didn't make clear to me how to both update the row and have access to some of the data. After a bit more reading on the Internet I learned that what I first thought would be possible, is indeed possible: setting a variable to the value of a column in the SET part of the UPDATE:

UPDATE some_table SET col1 = @col1 := col1, col2 = @col2 := col2 ...

This assigns the value of col1 to the user-defined variable @col1. The result of this expression, the value of col1 is then assigned back to col1, essentially a no-op in this case. The values assigned to the user-defined connection specific variables can be obtained using SELECT after the auto-commited transaction.

An outline of the Perl code I use to achieve this follows below:

my $dbh = DBI->connect(...);
$dbh->{ AutoCommit } = 1;  # enable, only needed if connect disabled this.
my $affected = $dbh->do( 
    'UPDATE table SET col1 = @col1 := col1, col2 = @col2 := col2,'
        . ' last_used = UNIX_TIMESTAMP() ORDER BY last_used ASC LIMIT 1' 
);
$affected == 1 or die 'UPDATE queue failed: no rows were affected';
my $row = $dbh->selectrow_hashref(
    'SELECT @col1 AS foo, @col2 AS bar' );
$dbh->disconnect();
print "$row->{ foo }, $row->{ bar }\n";

Note: if you use the queue in a circulair fashion, like I do, and use seconds after the epoch to keep track of usage be aware that if you rotate too fast that you might end up picking some rows with a much higher frequency. The solution is to use a more fine grained time like milliseconds since the epoch to achieve a more fair distribution.

Job queue in MySQL related

Five subtle ways you're using MySQL as a queue, and why it'll bite you
User-Defined Variables - MySQL 5.6 Reference Manual

Also today

Two MySQL tips (overview)
ON DUPLICATE KEY with IF

Please post a comment | read 0 comments | RSS feed