Perl programmer for hire: download my resume (PDF).
John Bokma Perl
freelance Perl programmer

Comments: Create an RSS web feed automatically

6 comments

The following (beta) Perl program scans your document root, the directory containing the HTML files of your website, and automatically creates an RSS web feed for a given number of most recently modified pages.

Read the rest of Create an RSS web feed automatically.

Comments

OK my perl skill are rubbish compared to yours, but instead of what you did displaying paragraph elements 'p' I set it to 'body'

Site content consists of (not my blog) <div id="name"> stuff </div> that would be better to display but Ive only been messing with this for an hour or so

I might try my feeble skills and look at setting encoding to UTF-8 as an option and see if i could add some <?xml-stylesheet... > stuff in there

Good stuff

Posted by Bananas in the Falklands at 18:47 GMT on 3 January 2006

if you want to make this better validated rss output against a third party tool - add

, encoding => 'UTF-8'

to

my $rss = new XML:RSS (version => '1.0');

and remove/comment the line

$link =~ s/index\.html?$//;
Posted by S J West at 17:03 GMT on 4 January 2006

According to the XML::RSS module documentation (or at least the version I am using, which is 1.05):

perldoc XML::RSS
...
    new XML::RSS (version=>$version, encoding=>$encoding, output=>$output)
        Constructor for XML::RSS. It returns a reference to an XML::RSS
        object. You may also pass the RSS version and the XML encoding to
        use. The default version is 1.0. The default encoding is UTF-8. You
...

Of course I did check it today, and indeed, when I run the Perl program, the first line of output is:

<?xml version="1.0" encoding="UTF-8"?>

I used the following one-liner to obtain the version number of the XML::RSS module (note: no space after the print):

perl -MXML::RSS -e print$XML::RSS::VERSION
1.05

So if you get no encoding it might be that you're using an older version of XML::RSS?

However, I am going to think about adding extra options to the script like specifying the encoding, RSS version, style sheet, image, etc.

Regarding the index.htm(l) removal, normally

http://example.com/

is set up to be an alias for

http://example.com/index.html

A validator shouldn't complain if the former is used if it's internally redirected to the latter.

It's often not a good idea to mix / and index.html on a site, so I recommend to use the former.

Posted by John Bokma at 03:43 GMT on 6 January 2006

@Bananas

If you modify the line

my $p_element = $root->look_down( _tag => 'p' );

to

my $p_element = $root->look_down( _tag => 'div' );

you get the first div in $p_element (which probably should be renamed to $div_element when you make that change). If you don't want the first div, but the 3rd one, use:

my $p_element = ( $root->look_down( _tag => 'div' ) )[ 2 ];

Remember programmers count 0, 1, 2, etc.

If the div has an id, or class, you might be able to use something like:

my $p_element = $root->look_down( _tag => 'div', class => 'class name' );
Posted by John Bokma at 04:04 GMT on 6 January 2006

Thanks John the 'div', id => 'something' trick works a treat - here - Perl can do it - its just you have got to know it.

the 'perl -MXML::RSS -e print$XML::RSS::VERSION' does nothing for me - probably my local setup screw up.

when i kept $link =~ s/index\.html?$//; line in and ran the input through Feed Validator for Atom and RSS it didnt like it -

I'm sticking with the encoding line too - it works here

Apparently i need some extra sax perl modules for stylesheets and the href="" and type entries, As I say I'm rubbish at perl - a discovery of a sort.

Its still a cool script !

Posted by Bananas in the Falklands at 15:58 GMT on 11 January 2006

I guess I found the problem you both are seeing. The error you've been seeing is probably:

rdf:about values must not be duplicated within a feed

And it happens for http://example.com/ when http://example.com/ also is added as an item to the feed. I consider this a bug in either the feed validator or the RSS specification, since that means that a feed for http://example.com/ never can have an item for http://example.com/.

I look further in it, thanks for reporting.

Posted by John Bokma at 19:57 GMT on 11 January 2006

Post a comment

Note that your comment doesn't show up immediately. I review each comment before I add it to this site.

Check the Follow this page option if you want to receive an email each time a comment is posted to this page, including yours. A link to turn this option off will be included with each email.

Internet adresses will be converted automatically. You can use the following notation to specify anchor text for a link: [url=http://example.com/]example text[/url].