Perl programmer for hire: download my resume (PDF).
John Bokma's Hacking & Hiking

Hand coding an RSS 2.0 feed in Python

October 9, 2019

One requirement I have for tumblelog is that everything the Python version generates is identical to everything the Perl version generates. This means that sometimes I have to hand code a function that is available in a library for, say Python, but works differently in a Perl library.

When I was working on adding an RSS feed to Python version I decided to use lxml.etree as this would give me some control over the output. But, alas, not enough to match the output of the Perl version of tumblelog. Moreover, I was not really happy with the generated XML. So I decided to hand code both the Python and the Perl version.

def create_rss_feed(days, config):

    items = []
    todo = config['days']

    for day in days:
        (url, title, description) = get_url_title_description(day, config)

        # RFC #822 in USA locale
        ctime = end_of_day.ctime()
        pub_date = (f'{ctime[0:3]}, {end_of_day.day:02d} {ctime[4:7]}'
                        + end_of_day.strftime(' %Y %H:%M:%S %z'))

        items.append(
            ''.join([
                '<item>'
                '<title>', escape(title), '</title>'
                '<link>', escape(url), '</link>'
                '<guid isPermaLink="true">', escape(url), '</guid>'
                '<pubDate>', escape(pub_date), '</pubDate>'
                '<description>', escape(description), '</description>'
                '</item>'
            ])
        )
        todo -= 1
        if not todo:
            break


    xml = ''.join([
        '<?xml version="1.0" encoding="UTF-8"?>'
        '<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">'
        '<channel>'
        '<title>', escape(config['name']), '</title>'
        '<link>', escape(config['blog-url']), '</link>'
        '<description>', escape(config['description']),'</description>'
        '<atom:link href="', escape(config['rss-feed-url']),
        '" rel="self" type="application/rss+xml" />',
        *items,
        '</channel>'
        '</rss>'
    ])
    feed_path = config['rss-path']
    p = Path(config['output-dir']).joinpath(feed_path)
    with p.open(mode='w', encoding='utf-8') as f:
        print(xml, file=f)

    if not config['quiet']:
        print(f"Created '{feed_path}'")

Note: if you think there are commas missing in the above code recall that in Python the compiler glues strings together if there is no comma between them.

Note: '<title>', escape(title), '</title>' can also be written as: 'f<title>{escape(title)}</title>' but I was afraid that this would make the code harder to read.

The code is quite straightforward. The calculation of the publication date is explained in RFC #822 and RFC #3339 dates in Python.

The escape function is imported from the html module.

If you are interested in the rest of the source code you can download it from GitHub.

Related