Hand coding an RSS 2.0 feed in Perl
October 9, 2019
One requirement I have for tumblelog
is that
everything the Perl version generates is identical to everything the
Python version generates. This means that sometimes I have to hand
code a function that is available in a library for, say Python, but
works differently in a Perl library.
When I was working on adding an RSS feed to the Perl version I decided
to use XML::RSS
at first. But I didn't like the generated output,
also because I had little to no control over it. Next I have
XML::Writer
a spin, but couldn't match the output of Python, using
lxml.etree
. So I decided to hand code both the Perl and the Python
version as it's a simple feed, and XML::Writer
just added a few
wrapper functions.
my @MON_LIST = qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec );
my @DAY_LIST = qw( Sun Mon Tue Wed Thu Fri Sat );
sub create_rss_feed {
my ( $days, $config ) = @_;
my @items;
my $todo = $config->{ days };
for my $day ( @$days ) {
my ( $url, $title, $description )
= get_url_title_description( $day, $config );
my $end_of_day = get_end_of_day( $day->{ date } );
# RFC #822 in USA locale
my $pub_date = $DAY_LIST[ $end_of_day->_wday() ]
. sprintf( ', %02d ', $end_of_day->mday() )
. $MON_LIST[ $end_of_day->_mon ]
. $end_of_day->strftime( ' %Y %H:%M:%S %z' );
push @items, join( '',
'<item>',
'<title>', escape( $title ), '</title>',
'<link>', escape( $url ), '</link>',
'<guid isPermaLink="true">', escape( $url ), '</guid>',
'<pubDate>', escape( $pub_date ), '</pubDate>',
'<description>', escape( $description ), '</description>',
'</item>'
);
--$todo or last;
}
my $xml = join( '',
'<?xml version="1.0" encoding="UTF-8"?>',
'<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">',
'<channel>',
'<title>', escape( $config->{ name } ), '</title>',
'<link>', escape( $config->{ 'blog-url' } ), '</link>',
'<description>', escape( $config->{ description } ),'</description>',
'<atom:link href="', escape( $config->{ 'rss-feed-url' } ),
'" rel="self" type="application/rss+xml" />',
@items,
'</channel>',
'</rss>',
"\n"
);
my $path = $config->{ 'rss-path' };
path( "$config->{ 'output-dir' }/$path" )
->append_utf8( { truncate => 1 }, $xml );
$config->{ quiet } or print "Created '$path'\n";
return;
}
The code is quite straightforward. The calculation of the publication date is explained in RFC #822 and RFC #3339 dates in Perl.
The escape
function is also hand coded to make sure its output is
identical to the escape
function in Python's html
module:
sub escape {
my $str = shift;
for ( $str ) {
s/&/&/g;
s/</</g;
s/>/>/g;
s/"/"/g;
s/'/'/g;
}
return $str;
}
It uses a for
loop to alias $str
to $_
. As the s
operator
defaults to $_
this saves some typing and in my opinion is more
clear.
Note that a single quote is replaced with a hexadecimal code instead
of '
for maximum compatibility, see also Character entity
references in HTML
4.
If you are interested in the rest of the source code you can download it from GitHub.