Hand coding an RSS 2.0 feed in Python
October 9, 2019
One requirement I have for tumblelog
is that
everything the Python version generates is identical to everything the
Perl version generates. This means that sometimes I have to hand code
a function that is available in a library for, say Python, but works
differently in a Perl library.
When I was working on adding an RSS feed to Python version I decided
to use lxml.etree
as this would give me some control over the
output. But, alas, not enough to match the output of the Perl version
of tumblelog
. Moreover, I was not really happy with the generated
XML. So I decided to hand code both the Python and the Perl version.
def create_rss_feed(days, config):
items = []
todo = config['days']
for day in days:
(url, title, description) = get_url_title_description(day, config)
# RFC #822 in USA locale
ctime = end_of_day.ctime()
pub_date = (f'{ctime[0:3]}, {end_of_day.day:02d} {ctime[4:7]}'
+ end_of_day.strftime(' %Y %H:%M:%S %z'))
items.append(
''.join([
'<item>'
'<title>', escape(title), '</title>'
'<link>', escape(url), '</link>'
'<guid isPermaLink="true">', escape(url), '</guid>'
'<pubDate>', escape(pub_date), '</pubDate>'
'<description>', escape(description), '</description>'
'</item>'
])
)
todo -= 1
if not todo:
break
xml = ''.join([
'<?xml version="1.0" encoding="UTF-8"?>'
'<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">'
'<channel>'
'<title>', escape(config['name']), '</title>'
'<link>', escape(config['blog-url']), '</link>'
'<description>', escape(config['description']),'</description>'
'<atom:link href="', escape(config['rss-feed-url']),
'" rel="self" type="application/rss+xml" />',
*items,
'</channel>'
'</rss>'
])
feed_path = config['rss-path']
p = Path(config['output-dir']).joinpath(feed_path)
with p.open(mode='w', encoding='utf-8') as f:
print(xml, file=f)
if not config['quiet']:
print(f"Created '{feed_path}'")
Note: if you think there are commas missing in the above code recall that in Python the compiler glues strings together if there is no comma between them.
Note: '<title>', escape(title), '</title>'
can also be written as:
'f<title>{escape(title)}</title>'
but I was afraid that this would
make the code harder to read.
The code is quite straightforward. The calculation of the publication date is explained in RFC #822 and RFC #3339 dates in Python.
The escape
function is imported from the html
module.
If you are interested in the rest of the source code you can download it from GitHub.