Hire a senior Perl / Python programmer today; download my up-to-date resume (PDF)
John Bokma MexIT
freelance Perl programmer

Spreadsheet::ParseExcel and UCS2 data

Tuesday, February 24, 2009 | 1 comment

When a Perl program I had written had some encoding issues with spreadsheet data a customer fed to it, I asked if he could email me the Excel spreadsheet so I could have a look.

After a quick examination I noticed that some cells had their data encoded as UCS2. So I decided to use the excellent Encode module to convert the data to UTF-8. At the top of the Perl program I added:

use Encode;

And to the loop that processed all spreadsheet rows I added a check to see if a cell had a defined Code field, and if so, if the encoding equals 'ucs2'. In which case the cell value is converted from UCS-2BE to UTF-8.

my $cell = $cells->[ $row ][ $col ];
my $value = $cell->{ Val };
if ( defined $cell->{ Code } ) {

    if ( $cell->{ Code } eq 'ucs2' ) {

        $value = Encode::decode( 'UCS-2BE', $value );
    else {

        die "$row: unknown encoding: $cell->{ Code }";

In case an unknown encoding is encountered, the program halts with an error message. Based on the documentation of Spreadsheet::ParseExcel the only other value besides undef, and ucs2 the character encoding of a cell can have is _native_, which is unlikely to occur in this case. And since it's unlikely, I decided to handle this as an exception using die.


Also today

Please post a comment | read 1 comment by Michael Collinson | RSS feed