Implementing a job queue using a table in MySQL might be not the best possible way, see Five subtle ways you're using MySQL as a queue, and why it'll bite you.
At the beginning of this year I had a problem: from a Perl
program I had to select the oldest row in a MySQL database
table to obtain some data and set the time of the row to the
current time to prevent other processes to use the same row
data immediately. In short: I wanted to update a row in a
transaction and being able to read some data from it at the
same time. The first solution that came to mind was using an
UPDATE
which would also set user-defined
variables and then after the commit I could use
SELECT
to fetch the values assigned to those
user-defined connection specific variables. While this is
possible, see below, back then I couldn't find out the
correct syntax to do this in the MySQL Reference Manual.
But while reading the Reference Manual I learned about
SELECT ... FOR UPDATE
and thought that I had found the solution:
START TRANSACTION
SELECT ... FROM table ORDER BY last_used ASC LIMIT 1 FOR UPDATE
UPDATE table SET last_used = UNIX_TIMESTAMP() WHERE ...
COMMIT
While this does work for a small number of parallel processes increasing the number of processes might soon run into one process holding a lock too long, causing the following error:
Lock wait timeout exceeded; try restarting transaction.
When I ran into this issue I did some more research and came
upon the aforementioned article Five
subtle ways you're using MySQL as a queue, and why it'll
bite you by Baron Schwartz. It mentioned, amongst other,
the above issue with SELECT ... FOR UPDATE
and
advices to use UPDATE
with a LIMIT
instead if you really want to use MySQL to implement a job
queue, and to make sure that autocommit is enabled. The
latter greatly reduces the time the lock is being held by a
process.
The article itself didn't make clear to me how to both
update the row and have access to some of the data. After
a bit more reading on the Internet I learned that what I
first thought would be possible, is indeed possible:
setting a variable to the value of a column in the SET
part of the UPDATE:
UPDATE some_table SET col1 = @col1 := col1, col2 = @col2 := col2 ...
This assigns the value of col1
to the
user-defined variable @col1
. The result of this
expression, the value of col1
is then assigned
back to col1
, essentially a no-op in this
case. The values assigned to the user-defined connection
specific variables can be obtained using SELECT
after the auto-commited transaction.
An outline of the Perl code I use to achieve this follows below:
my $dbh = DBI->connect(...);
$dbh->{ AutoCommit } = 1; # enable, only needed if connect disabled this.
my $affected = $dbh->do(
'UPDATE table SET col1 = @col1 := col1, col2 = @col2 := col2,'
. ' last_used = UNIX_TIMESTAMP() ORDER BY last_used ASC LIMIT 1'
);
$affected == 1 or die 'UPDATE queue failed: no rows were affected';
my $row = $dbh->selectrow_hashref(
'SELECT @col1 AS foo, @col2 AS bar' );
$dbh->disconnect();
print "$row->{ foo }, $row->{ bar }\n";
Note: if you use the queue in a circulair fashion, like I do, and use seconds after the epoch to keep track of usage be aware that if you rotate too fast that you might end up picking some rows with a much higher frequency. The solution is to use a more fine grained time like milliseconds since the epoch to achieve a more fair distribution.