Need a web scraping or data wrangling solution? Check out my resume (PDF).
John Bokma's Hacking & Hiking

ISO week and year in Perl

September 4, 2019

Today I discovered a bug in how the Perl version of tumblelog, the static microblog generator I wrote, calculates the archives; instead of the ISO year I used the actual year, which resulted in subtle errors at the beginning and end of some years while testing.

For example 2008-12-29 falls in ISO week 1 of year 2009, and in my code this became week 1 of 2008. And because I process dates in order this odd week came at the end of 2008; Oops.

One example of the wrong way to do this is:

sub get_year_week {

    my $date = shift;
    my $tp = parse_date( $date );
    return join_year_week( $tp->year(), $tp->week() );
}

The parse_date function returns a Time::Piece object from which I obtain the week (correct) and the current year (incorrect) instead of the ISO year.

When I finally had figured out why my test archive pages ended up looking odd I started to look for a solution. Since tumblelog uses the core module Time::Piece I preferred a solution using this module. But I couldn't find any method that was named in such a way to suggest it would return the ISO year.

Next, I found Neil Bowers' Date::WeekNumber. So I checked the source, hoping to find a short code snippet I could use, but Neil uses Date::Calc::Week_of_Year. So I checked out this module next to see if I could copy the algorithm used in my own code, because I wanted to avoid another dependency. Moreover, I thought it was something that would be easy to calculate with the methods already available in Time::Piece. But the code depended on other methods in Date::Calc.

Because I was sure the Python version of tumblelog works correctly I also looked at the Python implementation of the aptly named isocalendar method. In the comments of this method a resource is mentioned from which the algorithm was taken: The Mathematics of the ISO 8601 Calendar.

The algorithms given still looked too complex to just add to my code to avoid a dependency so I kept Googling and landed on David Farrell's Solve almost any datetime need with Time::Piece. While his article doesn't have an explicit solution it has a hint in the following line of code:

$time->strftime('%Y %y %G %g');    # 2014 14 2014 14 (4 different years,really)

Hmmm, %G, let's check the man page of strftime:

       %G     The ISO 8601 week-based year (see NOTES) with century as a deci-
              mal number.  The 4-digit year corresponding to the ISO week num-
              ber (see %V).  This has the same format and value as %Y,  except
              that  if  the  ISO  week  number belongs to the previous or next
              year, that year is used instead. (TZ) (Calculated from  tm_year,
              tm_yday, and tm_wday.)

This looked promising. So I dived into the source of Time::Piece only to discover that %G is calculated by calling strftime from C and not some method I accidentally had overlooked. So I settled on the solution of calling the strftime Perl method of Time::Piece as follows:

sub get_year_and_week {

    my $tp = shift;
    return ( $tp->strftime('%G'), $tp->week() );
}

This works, is short, and uses a core module (core since Perl version 5.9.5).

Finally I wrote some code to verify the use of %G to obtain the ISO year and used the table found in Wikipedia article ISO week date as input. In order to get the output with day number included I used:

$tp->strftime('%G-W%V-%u');

The test code I wrote passed all examples in the table correctly.

An updated version, 1.0.7, of tumblelog will be pushed to GitHub tomorrow.

Related