Perl programmer for hire: download my resume (PDF).
John Bokma MexIT
freelance Perl programmer

Removing PI and comments from XML

Tuesday, November 15, 2005 | 0 comments

Today someone asked in the Usenet group comp.lang.perl.modules for examples of using XML::Parser. I gave a link to Finding the number of unique XML elements which has a small example on using the XML::Parser module. I also recommended to have a look at the other XML related Perl modules available on CPAN.

The original poster posted a follow-up to my reply, wondering "how to weed out processing statements and comments" so I posted a small working example using XML::Parser:

use strict;
use warnings;


use XML::Parser;

my $parser = new XML::Parser(

    Style => 'Stream',

    Handlers => {

        Comment => \&xml_comment,
        Proc    => \&xml_pi
    },
);

$parser->parse( <<'XML' );
<foo>
    <?some_processing_instruction?>
    <bar>
        some text
        <!-- comment -->
    </bar>
</foo>
XML


sub xml_comment {

    return;
}


sub xml_pi {

    return;
}

The Perl program gives the following output:

<foo>

    <bar>
        some text

    </bar>
</foo>

I also recommended to have a look at XML::Twig, since the constructor (new) allows for specifying for comments using the comments option: 'drop' (the default), 'keep', or 'process' and for processing instructions using the pi option: 'drop', 'keep' (the default), or 'process'.

XML processing related

Also today

Please post a comment | read 0 comments | RSS feed