The following (beta) Perl program scans your document root, the directory containing the HTML files of your website, and automatically creates an RSS web feed for a given number of most recently modified pages.
Read the rest of Create an RSS web feed automatically.
if you want to make this better validated rss output against a third party tool - add
, encoding => 'UTF-8'
to
my $rss = new XML:RSS (version => '1.0');
and remove/comment the line
$link =~ s/index\.html?$//;
According to the XML::RSS module documentation (or at least the version I am using, which is 1.05):
perldoc XML::RSS
...
new XML::RSS (version=>$version, encoding=>$encoding, output=>$output)
Constructor for XML::RSS. It returns a reference to an XML::RSS
object. You may also pass the RSS version and the XML encoding to
use. The default version is 1.0. The default encoding is UTF-8. You
...
Of course I did check it today, and indeed, when I run the Perl program, the first line of output is:
<?xml version="1.0" encoding="UTF-8"?>
I used the following one-liner to obtain the version number of the XML::RSS module (note: no space after the print):
perl -MXML::RSS -e print$XML::RSS::VERSION
1.05
So if you get no encoding it might be that you're using an older version of XML::RSS?
However, I am going to think about adding extra options to the script like specifying the encoding, RSS version, style sheet, image, etc.
Regarding the index.htm(l) removal, normally
http://example.com/
is set up to be an alias for
http://example.com/index.html
A validator shouldn't complain if the former is used if it's internally redirected to the latter.
It's often not a good idea to mix / and index.html on a site, so I recommend to use the former.
@Bananas
If you modify the line
my $p_element = $root->look_down( _tag => 'p' );
to
my $p_element = $root->look_down( _tag => 'div' );
you get the first div in $p_element (which probably should be renamed to $div_element when you make that change). If you don't want the first div, but the 3rd one, use:
my $p_element = ( $root->look_down( _tag => 'div' ) )[ 2 ];
Remember programmers count 0, 1, 2, etc.
If the div has an id, or class, you might be able to use something like:
my $p_element = $root->look_down( _tag => 'div', class => 'class name' );
Thanks John the 'div', id => 'something' trick works a treat - here - Perl can do it - its just you have got to know it.
the 'perl -MXML::RSS -e print$XML::RSS::VERSION' does nothing for me - probably my local setup screw up.
when i kept $link =~ s/index\.html?$//; line in and ran the input through Feed Validator for Atom and RSS it didnt like it -
I'm sticking with the encoding line too - it works here
Apparently i need some extra sax perl modules for stylesheets and the href="" and type entries, As I say I'm rubbish at perl - a discovery of a sort.
Its still a cool script !
I guess I found the problem you both are seeing. The error you've been seeing is probably:
rdf:about values must not be duplicated within a feed
And it happens for http://example.com/ when http://example.com/ also is added as an item to the feed. I consider this a bug in either the feed validator or the RSS specification, since that means that a feed for http://example.com/ never can have an item for http://example.com/.
I look further in it, thanks for reporting.
Note that your comment doesn't show up immediately. I review each comment before I add it to this site.
Check the Follow this page option if you want to receive an email each time a comment is posted to this page, including yours. A link to turn this option off will be included with each email.
Internet adresses will be converted automatically. You can use the following notation to specify anchor text for a link: [url=http://example.com/]example text[/url].
OK my perl skill are rubbish compared to yours, but instead of what you did displaying paragraph elements 'p' I set it to 'body'
Site content consists of (not my blog) <div id="name"> stuff </div> that would be better to display but Ive only been messing with this for an hour or so
I might try my feeble skills and look at setting encoding to UTF-8 as an option and see if i could add some <?xml-stylesheet... > stuff in there
Good stuff