Today I received an email by Roy Schestowitz. He reported a problem with one of my Perl scripts: gpos.pl. It seemed to return a lot of garbage with a specific query. After careful examination of the Perl script I found the bug I had made: printf plus string interpolation in the format string.
The format string is the first parameter to the printf function (or sprintf for that matter). The format string uses a special notation for placeholders of the actual values that follow the format string. The notation starts with a % sign, followed by one or more characters which specify how the value should be formatted. For example %d means a signed integer, in decimal notation.
The code containing the bug was as follows:
printf "%3d - $url\n", $position
What actually happens is that the contents of $url is interpolated into the format string before printf uses it. This can go wrong if $url contains a % sign. Is this likely to happen? Yes, since the % sign is used in URLs to encode special characters by their hexadecimal value. For example %20 means a space. Hence in some cases the format string has additional formatting options, for example a URL which contains: September%202004%20Voice results in 202004 spaces being output (Oops!).
It's very easy to make this mistake. After checking all my other public available Perl scripts I found another one: prog.pl. I updated this Perl script yesterday to make it work with the changes in the search engine it uses to extract information, but somehow overlooked this bug.
The fix is easy: don't use variables in the format string that might after interpolation add extra formatting options accidentally:
printf "%3d - %s\n", $position, $url