Perl programmer for hire: download my resume (PDF).
John Bokma's Hacking & Hiking

Hand coding an RSS 2.0 feed in Perl

October 9, 2019

One requirement I have for tumblelog is that everything the Perl version generates is identical to everything the Python version generates. This means that sometimes I have to hand code a function that is available in a library for, say Python, but works differently in a Perl library.

When I was working on adding an RSS feed to the Perl version I decided to use XML::RSS at first. But I didn't like the generated output, also because I had little to no control over it. Next I have XML::Writer a spin, but couldn't match the output of Python, using lxml.etree. So I decided to hand code both the Perl and the Python version as it's a simple feed, and XML::Writer just added a few wrapper functions.

my @MON_LIST = qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec );
my @DAY_LIST = qw( Sun Mon Tue Wed Thu Fri Sat );

sub create_rss_feed {

    my ( $days, $config ) = @_;

    my @items;
    my $todo = $config->{ days };

    for my $day ( @$days ) {

        my ( $url, $title, $description )
            = get_url_title_description( $day, $config );

        my $end_of_day = get_end_of_day( $day->{ date } );
        # RFC #822 in USA locale
        my $pub_date = $DAY_LIST[ $end_of_day->_wday() ]
            . sprintf( ', %02d ', $end_of_day->mday() )
            . $MON_LIST[ $end_of_day->_mon ]
            . $end_of_day->strftime( ' %Y %H:%M:%S %z' );

        push @items, join( '',
            '<item>',
            '<title>', escape( $title ), '</title>',
            '<link>', escape( $url ), '</link>',
            '<guid isPermaLink="true">', escape( $url ), '</guid>',
            '<pubDate>', escape( $pub_date ), '</pubDate>',
            '<description>', escape( $description ), '</description>',
            '</item>'
        );
        --$todo or last;
    }

    my $xml = join( '',
        '<?xml version="1.0" encoding="UTF-8"?>',
        '<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">',
        '<channel>',
        '<title>', escape( $config->{ name } ), '</title>',
        '<link>', escape( $config->{ 'blog-url' } ), '</link>',
        '<description>', escape( $config->{ description } ),'</description>',
        '<atom:link href="', escape( $config->{ 'rss-feed-url' } ),
        '" rel="self" type="application/rss+xml" />',
        @items,
        '</channel>',
        '</rss>',
        "\n"
    );

    my $path = $config->{ 'rss-path' };
    path( "$config->{ 'output-dir' }/$path" )
        ->append_utf8( { truncate => 1 }, $xml );
    $config->{ quiet } or print "Created '$path'\n";

    return;
}

The code is quite straightforward. The calculation of the publication date is explained in RFC #822 and RFC #3339 dates in Perl.

The escape function is also hand coded to make sure its output is identical to the escape function in Python's html module:

sub escape {

    my $str = shift;

    for ( $str ) {
        s/&/&amp;/g;
        s/</&lt;/g;
        s/>/&gt;/g;
        s/"/&quot;/g;
        s/'/&#x27;/g;
    }
    return $str;
}

It uses a for loop to alias $str to $_. As the s operator defaults to $_ this saves some typing and in my opinion is more clear.

Note that a single quote is replaced with a hexadecimal code instead of &apos; for maximum compatibility, see also Character entity references in HTML 4.

If you are interested in the rest of the source code you can download it from GitHub.

Related