I am working on quite a big project using Perl and MS SQL, and last Monday I ran into an issue. Because most of the week I was very busy, I decided to write down the problem and solution (or better: solutions) today.
In the project, addresses of web pages need to be stored in a table in a MS SQL database, so I called the table URL, and in order to check if a URL is already in the table I put a UNIQUE constraint on the URL itself (URL.Value).
When the Perl program attempted to store quite a long URL I was greeted by the following error:
The index entry of length 911 bytes for the index 'UQ__URL__1367E606'
exceeds the maximum length of 900 bytes.
Argh! But I wasn't really worried, because I could think up a simple solution that would work, adding an extra column and store, for example an MD5 digest (checksum), calculated from the web address into it, and put an index on it (but not an UNIQUE constraint, since two different URLs can result in exactly the same checksum). The checksum can then be used to look up the URL(s) fast, and a final compare between the value(s) found and the value given is enough to find the right unique URL (and hence the unique URLID).
Since I was not sure that this was a good or recommended solution, I posted my question in the Usenet group comp.databases.ms-sqlserver. Since I couldn't really wait for the answer, I decided to implement my solution anyway. And since I believe strongly in letting the database server do as much work as possible, I checked the documentation for a checksum function. I preferred not to do the calculation in Perl, if possible.
And when I read the documentation on the checksum function, I was confronted with exactly the solution I was about to implement:
The CHECKSUM function may be used to build hash indices. The hash index is built by adding a computed checksum column to the table being indexed, then building an index on the checksum column.
followed by an example. Because the checksum column is calculated when doing for example an insert, SET ARITHABORT must be set to ON, see the documentation on ARITHABORT in SQL Server Books Online.
So I implemented the following Perl code (snippet):
$dbh->do( 'SET ARITHABORT ON' );
my $sth = $dbh->prepare( <<'SQL' );
IF NOT EXISTS( SELECT URLID FROM URL WHERE CheckSumValue=checksum(?) AND Value=? )
INSERT URL (Value) VALUES( ? )
SQL
$sth->execute( $url, $url, $url );
$sth->finish;
$sth = $dbh->prepare( <<'SQL' );
SELECT URLID FROM URL WHERE CheckSumValue=checksum(?) AND Value=?
SQL
$sth->execute( $url, $url );
my ( $url_id ) = $sth->fetchrow_array;
$sth->finish;
which didn't work, since I got the following error at the line executing the SELECT statement:
DBD::ODBC::st execute failed: [Microsoft][ODBC SQL Server Driver]Invalid character value
for cast specification (SQL-22018)(DBD: st_execute/SQLExecute err=-1)
Very odd, since the SELECT in the IF EXISTS does work. On an empty URL table, I got the following result after the error message (using OSQL -E):
1> SELECT COUNT(*) FROM URL
2> GO
-----------
1
(1 row affected)
>
Since I had not much time to dive into this, so I rewrote the above Perl code into (snippet):
$dbh->do( 'SET ARITHABORT ON' );
my $sth = $dbh->prepare( <<'SQL' );
SELECT checksum( ? )
SQL
$sth->execute( $url );
my ( $checksum ) = $sth->fetchrow_array;
$sth->finish;
$sth = $dbh->prepare( <<'SQL' );
IF NOT EXISTS( SELECT URLID FROM URL WHERE CheckSumValue=? AND Value=? )
INSERT URL (Value) VALUES( ? )
SQL
$sth->execute( $checksum, $url, $url );
$sth->finish;
$sth = $dbh->prepare( <<'SQL' );
SELECT URLID FROM URL WHERE CheckSumValue=? AND Value=?
SQL
$sth->execute( $checksum, $url );
my ( $url_id ) = $sth->fetchrow_array;
$sth->finish;
So much for letting the database do as much work as possible, but at least this code works.
However, today while writing this entry, I had some time to experiment with a new idea.
Since I already knew that the
problem was in the checksum(?)
part, I decided to set explicitly the type (SQL_VARCHAR) of
the column using the bind_param method of the DBI module:
use DBI 'SQL_VARCHAR';
:
$dbh->do( 'SET ARITHABORT ON' );
my $sth = $dbh->prepare( <<'SQL' );
IF NOT EXISTS( SELECT URLID FROM URL WHERE CheckSumValue=checksum(?) AND Value=? )
INSERT URL (Value) VALUES( ? )
SQL
$sth->execute( $url, $url, $url );
$sth->finish;
$sth = $dbh->prepare( <<'SQL' );
SELECT URLID FROM URL WHERE CheckSumValue=checksum(?) AND Value=?
SQL
$sth->bind_param( 1, $url, SQL_VARCHAR );
$sth->bind_param( 2, $url );
$sth->execute;
my ( $url_id ) = $sth->fetchrow_array;
$sth->finish;
This did the trick. Note how the SQL_VARCHAR constant is imported by specifying it in the use DBI
line.