John Bokma MexIT
freelance Perl programmer

printf and Perl: hidden dangers

Tuesday, June 14, 2005 | 0 comments

Today I received an email by Roy Schestowitz. He reported a problem with one of my Perl scripts: It seemed to return a lot of garbage with a specific query. After careful examination of the Perl script I found the bug I had made: printf plus string interpolation in the format string.

The format string is the first parameter to the printf function (or sprintf for that matter). The format string uses a special notation for placeholders of the actual values that follow the format string. The notation starts with a % sign, followed by one or more characters which specify how the value should be formatted. For example %d means a signed integer, in decimal notation.

The code containing the bug was as follows:

printf "%3d - $url\n", $position

What actually happens is that the contents of $url is interpolated into the format string before printf uses it. This can go wrong if $url contains a % sign. Is this likely to happen? Yes, since the % sign is used in URLs to encode special characters by their hexadecimal value. For example %20 means a space. Hence in some cases the format string has additional formatting options, for example a URL which contains: September%202004%20Voice results in 202004 spaces being output (Oops!).

Fixing the bug

It's very easy to make this mistake. After checking all my other public available Perl scripts I found another one: I updated this Perl script yesterday to make it work with the changes in the search engine it uses to extract information, but somehow overlooked this bug.

The fix is easy: don't use variables in the format string that might after interpolation add extra formatting options accidentally:

printf "%3d - %s\n", $position, $url


Also today

Please post a comment | read 0 comments | RSS feed