Archive

Posts Tagged ‘whipuptitude’

Where can I fly to this month?

June 1st, 2009 1 comment

All my playing with end-of-year travel plans has given me itchy feet. I’d like to go somewhere interesting for a few days sometime soon, but I don’t really care so much where. This is something the internets are meant to help with, but though the US is well served with any number of useful quirky travel sites, Europe doesn’t have so many of the “Just show me good deals” versions if you don’t live in certain key cities. So, in the DIY spirit, I wrote my own. I gathered a list of all the commercial airports I could find in Europe, grouped them by country, and wrote a script that searched on ITA in turn for all flights from Tallinn to any airport in that country over the next 30 days, and tell me the cheapest date to travel there. It’s a slightly nasty site to screen-scrape (and I’m pretty sure they don’t have any alternatives that you don’t have to pay for, as some of the puzzles they set job applicants involve scraping the site), and the code certainly isn’t pretty, but, thanks to Google Charts, the results are:

(Green is the cheapest, red the most expensive, yellow somewhere inbetween.)

My plan is to widen this beyond Europe, have it run every day, set some threshholds and have it email me any time something interesting appears. I suspect, however, that I’m much better served from Riga:

Thankfully there’s a comfortable bus to there!

More bmi Hacking

May 26th, 2009 3 comments

Star Alliance claim to be ‘committed to delivering to you the latest flight schedules from the Star Alliance members on multiple platforms Anytime, Anywhere.”‘ (emphasis mine). What’s more they go on to explain that that means that it will be ‘Automatically updated on your platform of choice.’

That is unless your ‘platform of choice’ is anything other than a Windows PC or a handheld with Palm OS, as their Electronic Timetable doesn’t run on, for example, a Mac. Instead we need to just make do with a hulking big PDF.

So, I decided to parse all the data out of that PDF, and on the basis that others might find it useful, make it available as a CSV file: Star Alliance Timetable 2009-05.

It’s nothing fancy, but being able to open it in Excel and filter on the various columns is still quite useful, and of course it opens up any number of other possibilities. I’m also considering building a little mini-application that makes it easier to play with, so if anyone has any suggestions for that, I’m all ears.

bmi Hacking

May 25th, 2009 No comments

I’ve been a bmi Diamond Club holder for many years. Unlike most Frequent Flier programs, airmiles you earn in this scheme never expire, so I’ve built up quite a few of them. However, it’s looking increasingly likely that bmi won’t actually be around for much longer — at least not in its current form. The most likely outcome seems to be a takeover by Lufthansa, and subsequent conversion of Diamond Club to their nowhere-near-as-good Miles and More scheme. So it’s looking like a good time to turn all my airmiles into a fun end-of-year escape-the-Tallinn-winter trip.

I’ve spent quite a bit of time over the last week learning how best to go about that, and discovering all manner of interesting ways of combining the various rules in interesting ways. (Much of this is learned from the fine folks at Flyer Talk, which, once you can get beyond all the jargon, is an amazing source of tips, tricks, and useful advice.)

The first thing you need to get the hang of is the bmi zone chart. Rather than spending miles based on the actual distance you fly, the world is divided up into a series of zones, and you pay a fixed rate per flight based on the zones you’re flying to/from. (This is purely in terms of the miles spent—you still need to pay the taxes depending on the airports you use, which, of course, differ everywhere.) I found it hard to keep track of which countries were in which zone, so I drew a pretty map.

The biggest problem with constructing a suitably interesting trip is that you’re only allowed one stop-over (visiting a city en-route for more than 24 hours) per ticket. So, for example, if you were to book a return from London to Sydney you’d only be allowed to stop off in other place (e.g. Singapore) in either direction. However, you can purchase one way tickets, so by getting two of those, instead of a return, you now get a stop-over in each direction, so could stop, for example, in Singapore for a couple of weeks on the way there, and Thailand on the way back.

What I then noticed was that to go from Zone 2 (Central/Eastern Europe — where I currently am) to Zone 10 (Australia/NZ — where I want to go) is 50,000 miles each way, but two singles from Zone 2 to Zone 8 (East Asia) and then Zone 8 to Zone 10 are only 25,000 each. Thus, by going via South Korea or Japan, for example, you can effectively get 3 free stops in each direction – effectively turning a naïve two destination trip (e.g. Copenhagen – (Bangkok) – Auckland – Copenhagen) into a seven destination trip for the same price (e.g. Copenhagen – (Bangkok) – Tokyo – (Hong Kong) – Auckland – (Sydney) – Seoul – (Delhi) – Copenhagen)! These are all published Star Alliance routes: Air Asiana, for example, fly Seoul to Copenhagen via Delhi and Zurich three times a week.

If you really wanted to, you could also (again, for the same price) omit the last ticket, and return Auckland–Copenhagen via L.A. or Vancouver turning it into a complete round the world trip at half the mileage cost of an actual round-the-world ticket!

I wrote a little script to analyse the entire zone chart for other free multi-zone detours, and discovered there were quite a few of them (including some where the detour actually lowered the total price, such as Zones 2–7 via 10 which is only 70,000 miles, instead of 80,000 direct!)

Of course, the longer the route, the more complexity there is in trying to piece it all together.  You get significantly more value spending the miles on business class flights than on economy, but availability on those disappears quite far in advance on popular routes (and isn’t available at all on many Singapore Airlines flights as they reserve those for their own card-holders rather than their Star Alliance partners). But I’m currently contemplating trying to piece together a 2-10-7-9-8-2 route, which is only 110,000 base miles, and would theoretically allow something along the following lines:

Riga – (Cairo) – Bombay – (Bangkok) – Manila – (Tokyo or Sydney) – Auckland – (Shanghai) – Tashkent or Almaty – (Istanbul) – Riga.

Which, if I can pull it off, isn’t bad for only 10,000 miles more than a simple Riga–Auckland return! Suggestions / alternatives / gotchas / etc. welcomed!

Splitting a WordPress blog in two

May 13th, 2009 No comments

This blog had its seventh birthday recently. I know there are many amongst you who have been blogging since before the term was even coined, and who make more posts in a month than I’ve made in seven years, but still.

Anyway, back in the early days of blogging, a significant percentage of blog posts weren’t original content, but the equivalent of retweeting: a way of passing on to your readers something interesting you’d read elsewhere. Of course the vast majority of those were links to other people’s blogs. It’s how word spread about interesting posts before digg and reddit and twitter and the like.

I tried to do something slightly different for a while: rather than just regurgitating other blog posts, I instead regurgitated interesting snippets from real dead tree books I was reading, picking interesting excerpts chapter by chapter.

It seemed to be well received, and I had a lot of fun choosing which couple of paragraphs from each chapter could convey something interesting enough to both stand alone without the surrounding context and also encourage others to seek out the book for more depth.

Early in 2004, I seem to have abandoned the idea. Likely it’s just because I was super-busy with Twingle, and then with Unite, and I probably always meant to get around to picking it up again, but just never did. Until now.

I decided, however, to do this on a new separate blog: dustyvolumes.com. So I had to work out how to move all the old posts to there. This was significantly more complicated than I expected. Doubtless someone will point me to a WordPress plugin that could have made the whole thing take 30 seconds, but in the absence of that, here’s the gory details for anyone else who ever wants to do something like this.

First, of course, I needed to have the new blog set up. I’m assuming that’s self-evident, and needs no further explanation.

Next I needed to find all the posts I wanted to move. I already had them all tagged with “Books”, so this part was fairly easy and avoided an even longer manual process. WordPress doesn’t have an ‘export by tag/category’ option, though—the only way to restrict an export is by author. So I had to go into “Posts > Edit”, find a post with the relevant tag, and click that tag to give me a list of all those posts. Then I could do a Bulk Edit of each to change the author to a new temporary account I set up just for this purpose. There were multiple pages of them, and there doesn’t seem to be a way to operate on more than one page at a time, so I went through them page by page. It was repetitive enough to make me want to find a short-cut, but there weren’t quite enough pages to make it worthwhile.

Then I exported all the posts by my new author, and imported those into the new blog. I did some more tidying up there of tags and categories etc, and found a few posts that should probably still remain on this blog instead (they were tagged with Books too, but were, for example, about me getting rid of my collection before moving to Estonia, rather than being excerpts suitable for Dusty Volumes), so deleted them from there, and changed the author here back to me on each of them in turn (I wanted that author to match exactly the posts that were on the other blog so I could continue to operate on those here).

Now I had the new blog working, but hit the much harder problem of what to do about the posts here. I could, of course, just have deleted the posts that I’d moved, but I still get quite a few hits on them from Google searches and links from other blogs, as well as some internal links to them, and I didn’t want to break all those. After some research I found a couple of WordPress plugins for setting up redirection. The first one I tried, “Redirection“, has lots and lots of features, but wasn’t quite what I wanted. The second, “Redirect“, was perfect. It does only one thing, but does it simply, and does it well. Using the Custom Field options in WordPress, it lets you set a ‘Redirect’ field with a value of the URL that viewers should be redirected to on viewing a given post. So now it was just a matter of going through and setting those up one by one.

Thankfully the WordPress import maintains the post ID from the export, so I didn’t need to spend any time building a map of which IDs should map where: each relevant post would just need to redirect to http://dustyvolumes.com/archives/<id>. I did a couple of these manually to make sure everything was working, but there was no way I wanted to do another 150 or so by hand. It was time to go to the database.

I’ve never actually explored the WordPress schema before, but there aren’t very many tables, and it’s fairly easy to work out what’s going on. (There’s probably decent documentation for it all too, but I tend to prefer to just work things like this out manually.) I’m not going to detail all the SQL commands I had to run: if you don’t know enough to work them out yourself you probably shouldn’t be playing with the database directly anyway, and should just do this the longwinded way (and I really don’t want to be fielding questions on it 6 months from now when the schema has changed). But it was a simple matter to just select the IDs of all posts by my fake ‘author’, and insert the relevant Redirect custom field values.

However, this still left a large number of ‘Books’ entries in my tagcloud that really weren’t there any more, so I also wanted to remove all the tags from these posts too. Ideally the Bulk Edit should be capable of this, but it currently only allows you to add a tag to multiple posts, not remove one, so again I went to the database. This one was slightly trickier, as it’s a cross-table DELETE, but again, if don’t know how to do that, you shouldn’t just be pasting in random SQL you found on someone’s blog somewhere.

Unfortunately, although that successfully removed all the tags, the tag cloud still proudly declared that I had a huge number of “Books” posts. WordPress, presumably for speed, keeps a total of how many posts are assigned to each category in a different table, and, being a typical modern webapp, maintains that count in client code rather than in the database itself. So having manually removed lots of tags without updating the count field too, my database was now out of sync with itself. MySQL doesn’t do cross-table UPDATEs with aggregates, so this time I needed an UPDATE with a subselect of a COUNT(*).

Including lots of cautious exploratory SELECTs, lots of LIMITs of my UPDATEs and DELETEs to make sure the right thing was happening each time, and backing up carefully after each major change, the whole thing took about an hour. I could possibly have done it all via the web interface in that time, but it would have been a close call, and there was a very high chance that I’d have gotten so bored in the middle of it that I’d have abandoned it half-way through, promising to finish it another day (and likely never quite gotten around to it). This way was mentally stimulating rather than draining, thus giving much more satisfaction when done, and I learned much more about the WordPress database structure that could be very useful if I ever decide to write a Plugin.

And now I have two blogs to rarely write in…

Migrating Movable Type to WordPress

February 6th, 2006 No comments

The server on which my weblog used to run is getting rather old and crumbly, and brings with it a constant low-level dread that some day, real soon now, it’s going to give up the ghost. So for the last while, we’ve gradually been moving everything off it. When it came time to move this site, I decided that it was also a good opportunity to migrate away from Movable Type, mostly for the same philosophical reasons that Mark Pilgrim has already set out.

I considered moving to Typo, following Piers‘ lead, mostly just so I could play with Ruby, but in the end I decided on WordPress. I’ve made the mistake too many times now of choosing software based on the language in which it’s written. Yes, it would be nicer to be able to hack on my weblog in Ruby or Perl than in PHP, but I know enough PHP to get by, and I doubt I’ll be doing that much hacking anyway.

Setting the weblog up was fairly trivial, as most good PHP installations tend to be. Migrating all my old content wasn’t quite so simple. WordPress 2 seems to have made the import process much simpler than before; most of the information on the process I’ve found relates to older versions and isn’t really applicable any more. Unfortunately, although the simple case of importing my MT archive was fairly painless, I really didn’t want to break all my old links.

There are a few sites that discuss how to maintain your Movable Type post IDs, but they all seem to relate to the old WordPress process. So I had to get my hands dirty in PHP much quicker than expected.

Firstly, I had to edit MT/App/CMS.pm in my MT setup, adding a line to include the entry id in the export output:

AUTHOR: <$MTEntryAuthor$>
TITLE: <$MTEntryTitle$>
ID: <$MTEntryID$>
STATUS: <$MTEntryStatus$>

Then I was able to export all my posts.

I had to post-process the output file, however, as I’ve been creating my posts using the MT Kwiki plugin. This meant that none of my links imported correctly. I spent much too long wrestling with vim’s non-greedy regular expressions before giving up and processing the data in Perl instead:

perl -pe 's/[(http:.*?) (.*?)]/$2/g' mt-dump.txt |
perl -pe 's/[(.*?) (http:.*?)]/$1/g' > deWikied.txt

Then I had to persuade WordPress to maintain the MT ids. In the old WordPress import script it just inserted the posts by hand, and it was a simple matter of ‘fixing’ the SQL it used to do this. But now the importer calls the same code that is used when you create a post through the normal interface.

So I needed to add a check for the ID into the import/mt.php script:

case 'AUTHOR' :
    $post_author = $value;
    break;
case 'ID' :
    $post_ID = $value;
    break;

And then fix the call that inserts the data:

$postdata = compact('post_ID','post_author', 'post_date', 'post_date_gmt', ... );

Then I needed to adjust the wp_insert_post() call to cope with an incoming post_ID:

  if ( !isset($post_ID) )
      $post_ID = 0;
  if ( !isset($post_password) )
      $post_password = '';

and adjust its SQL accordingly

"INSERT IGNORE INTO $wpdb->posts (id, post_author, post_date, ... ) VALUES
  ($post_ID, '$post_author', '$post_date', ...)");

(The arguments are passed as an extract() of a get_object_vars(), so there’s no need to change any of the other handling).

I believe that this is a safe enough approach that won’t interfere with creating new posts or editing old ones, but you can always revert this file back after importing if there are any problems.

With this in place, I was able to import all my old posts. (There were a lot of them, so I actually had to split the file and import 4 segments in turn). The other thing that the docs don’t make clear is that you need to have an upload directory which is writable by your webserver, but that was easy enough to work out from the error message.

They all came in with the same IDs as they used to have, so then it was just a matter of setting up some Apache redirects on the old server:

Redirect permanent /nothing/index.rdf

http://nothing.tmtm.com/feed/

RedirectMatch permanent /nothing/archives/([0-9]{6}).html

http://nothing.tmtm.com/archives/$1

RedirectMatch permanent /nothing/archives/([0-9]{4})_([0-9]{2}).html

http://nothing.tmtm.com/archives/date/$1/$2

RedirectMatch permanent /nothing/archives/([0-9]{4})_([0-9]{2})_([0-9]{2}).html

http://nothing.tmtm.com/archives/date/$1/$2/$3

RedirectMatch permanent /nothing/archives/([0-9]{4})_([0-9]{2})_([0-9]{2}).html

http://nothing.tmtm.com/archives/date/$1/$2/$3

(I’ve already changed my permalink structure in WordPress to have this style of URL)

There will be many more things to change later to replicate the changes I’d made to my MT set-up, but this at least gets me up and running on WordPress.

Belfast City Council Minutes

November 11th, 2005 No comments

I think I have found a new entrant for the Worst Software Ever™ awards: the Belfast City Council on-line minutes system.

There are at least three versions of this on-line. After talking with a consultant for the Corporate Applications Team I now know that two of these versions are obsolete. However, if you were to search in Google for, say, “Belfast City Council minutes”, those are the ones you will find. The “true” system is one you get if you click on the “Minutes” link from the main City Council website.

This new system doesn’t suffer from all the same faults as the old system (e.g. file name paths being returned from the search as \ rather than /, making the documents inaccessible), but still suffers from most of them. The worst is that you can’t actually browse the minutes in any sensible manner – you can only search them. This requires knowing what you’re actually looking for, which is no use to me, as I’m just wanting to see what the Council is up to generally, rather than on anything specific. To make any serious use of the system you also need to know that Council meetings happen in a three level hierarchy:

The main monthly council meeting mostly just accepts (or rejects) minutes from the Committees (Client Services, Contract Services, Development, Health & Environmental Services, Policy & Resources, and Town Planning). Most of those, in turn, do likewise for their Sub-Committees (for example Policy and Resources has committees for Drug Misuse, Finance/Admin/IS, Members, Personnel, and Policy & Performance Review). So if you want to read about what the council is actually doing, you need to find the relevant sub-committee minutes.

Of course these committees and subcommittees have changed over time as well, so searching in “Finance, Administration and Information Systems” will only take you back to 1997. In 1996/97 it was Finance & Admin (with a separate Information Systems sub-committee), and until 1995 it was just Admin.

The new system seems to have tried to make this process easier by making you specify whether the Committee Status is CURRENT or HISTORIC, but really that doesn’t help. (Particularly as you can only find out that those are the options by reading the 5 page User Guide PDF, or by clicking a button which eventually pops up a window asking you to select from those in a very clumsy way).

Of course, wrestling with the search is only the first problem. When you find an interesting set of minutes then you have to find a way to read them. There are little check-boxes beside all the results, but it’s not exactly obvious what they’re for. Clicking on the little “view” icon has an annoying habit of crashing Firefox, so the download link is probably what you want – assuming you can actually read Microsoft Word documents. If you can’t read Word, you’ll need to perservere with the “view”, which does a reasonable job of converting the Word document, although it doesn’t really like any councillor’s name containing a fada.

When you finally have a set of minutes to read, you’ll probably then discover that they’re not that useful unless you can also find all the referenced committee and sub-committee minutes, and probably several sets of historic minutes as well. Which, of course, means wrestling with the search system some more. And then there are lots of references to various Standing Orders and the like which probably won’t make a lot of sense to you. And of course if you want to know the party affiliations of all the people voting against certain proposals and the like you’ll need to go look those up somewhere else too. The information is all available of course – we are now living in a post-FOI country. But it’s not easy to find or make sense of.

So, I did what any self-respecting new-technology-aware Freedom-of-Information-loving geek would do, and created a wiki for it.

Please welcome http://nigov.tmtm.com/. I’ve populated it with some basic information on Belfast City Council, its members, departments, committees etc., and added all the 2005 minutes I can find. The system seems to only have minutes up to the May election, but that’s still over 100 sets of minutes.

They’re all just cut’n'paste from Word for now, and need a lot of tidying and cross-linking, but it’s a useful start.

Anyone who wants to help out please join in.

Maybe in a few years time, when it comes time to vote, people will actually be able to look at what the candidates have actually done during their time in office. And for those few people in Northern Ireland who don’t just vote along politico-religious lines, this might actually be useful!

Tags: ,

Last Day of Month in Excel

December 15th, 2004 24 comments

My next Excel Top Tip is how to calculate the last day of a month. I thought there was a function to do this as part of a whole suite of date manipulation functions, but I seem to have imagined that, as I can’t find it now. I was dreading having to do lots of nasty date arithmetic, but then I discovered that the zeroth day of a month is treated as the last day of the previous month! So the last day of the month for the date that’s in cell B3 is simply:

=DATE(YEAR(B3), MONTH(B3)+1, 0)

It even works across year boundaries, so DATE(2004, 13, 0) really does give December 31st 2004!

The Joys of CSV

November 12th, 2004 No comments

I’ve been working with CSV files a lot recently, mostly as a way of building web based management information tools out of SAGE data.

But I’ve always really hated working with the interface to Text::CSV_XS. So I put together Text::CSV::Simple. You just point it at the file you want, and read out all the rows:

my $parser = Text::CSV::Simple->new;
my @data = $parser-&gt;read_file($datafile);

You can tell it you only want certain fields:

$parser->want_fields(1, 2, 4, 8 );

And that you want the results straight into a hashref rather than just a listref:

$parser->field_map(qw/id name null town/);

There are also trigger points where you can pre- and post-process the data.

It’s certainly made dealing with CSV much easier for me. And it seems to be useful for other people too, as within a few weeks of its release I’ve had several feature requests and bug reports. Usually it takes a couple of months for a new module of mine to build up enough steam to get that.

However, I’ve now had several people all report a problem that I didn’t even consider before: it doesn’t handle newlines in strings. This disturbed me as I hadn’t realised until this that CSV files could actually contain embedded newlines! Of course, I can’t find any sensible documentation anywhere of what the CSV file format actually does and doesn’t allow, as it seems that Microsoft just made it a defacto standard by making it the main export format from Excel, without ever really specifying how it can be used. The few sites that I found that claim to provide more details on the format are contradictory (e.g. over the issue of header rows).

But it certainly does seem that linebreaks are acceptable, as long as they’re properly quoted. This shoots my whole approach to parsing the files apart, and means I’m going to have to go back and pretty much rewrite the module from scratch, and I may even have to lose one of my trigger points, as I still want to use Text::CSV_XS to do the actual parsing for me, but I’ll need to hook in at a different level now.

Of course I face my normal Open Source dilemma with this. The code clearly has a bug, but it’s not one that has any effect on me. None of the CSV files I have to deal with have linebreaks inside records. If the code wasn’t released, I’d apply my XP YAGNI principles, and defer the fix until I needed it. In some ways I’d like to be able to tell people who reported the bug that I’ll happily accept a patch if they can fix it, but otherwise they’ll have to wait until I need it. But having public code out there with known bugs irks me, so I guess I’ll just have to find the time from somewhere to fix it myself!

MT Amazon Reading List

November 11th, 2004 No comments

I’ve been asked which plugin I’m using to generate my “reading list” over on the sidebar. Like any true geek, of course, I actually wrote my own. Of course I’m generally into reuse where possible, but I wanted to learn how to write MT plugins, and it seemed like a good place to start. It also helped that I didn’t like any of the 3rd party plugins out there for this. There are probably much better ones available now. Mine also isn’t very good, but it’s part of a bigger plan…

Over the last year or so I’ve gradually been drinking the semantic web kool-aid. I’m sure I’ll rant more about this later, but I don’t believe it’s going to happen the way most people have been pushing for it, but I’m now convinced that it’s going to happen.

Of course I’m part of the problem for making it happen, as I’m a data geek. I collect structured information. My friends laugh at the fact that I could run queries to tell you how much I’ve spent on milk in the last year, but I find the information useful. (Well maybe not that information, exactly, but the general principle of being able to analyze my spending…)

Unfortunately most of the people wanting to make the semantic web happen are also data geeks who believe in structured information, even though the vast majority of the world aren’t. This is a very big problem for traditional semweb thinking, but I no longer think it matters very much.

But, in the meantime, I want to do stuff with my structured information. Such as the list of books I’ve read.

The first problem was how to store them. I’m reasonably well known to be a database guy. I also have a simple framework for building simple web apps to manage databases, so I considered building one for managing my books. But that seemed like too much hassle for now – I really wanted to just edit a file when I started reading a new book.

Faced with this problem, most techies these days seem to instinctively reach for XML. Personally I can’t stand it. I really hate how verbose it is. Unfortunately a large part of the Semantic Web work is also based around XML. Theoretically you can express your RDF in other ways, but really almost everyone is using XML. This used to bother me as I thought I’d need to do this, but now I believe that the more obscure and arcane we can make this stuff the better, as then everyone will want tools to do it, and only masochists will end up doing it by hand.

So for my books I, instead, reached instinctively for YAML. I thought for a while about what information I’d want to store, before realising that I was much too lazy to want to type any information that could be found elsewhere. So my YAML file really just includes the ISBN of the book, and the rough date that I read it. Of course I don’t usually read a book in one day – I quite often read 4 or 5 books simultaneously over a period, just to get an interplay of ideas happening. And there are lots of books I start, read about half of, and don’t get round to finishing for months, or sometimes even years (if at all). I spent a while trying to find a sensible way to model that, before deciding it was all much to complex, and I’d be happy enough with just entering a rough date.

So I ended up with a very basic YAML file:

---
books:
 
   - isbn    : "0596007515"
     title   : "Ggl Hacks"
     date    : "2004-11-01"
     current : 1
 
   - isbn    : "0439977789"
     title   : "Ruby / Smoke"
     date    : "2004-11-01"
 
   - isbn    : "075093204X"
     title   : "Decline and Fall Everybody"
     date    : "2004-10-09"

The ‘title’ field is there just as a placeholder to aid human readability. It never actually gets used anywhere, so I can fill it with shorthand etc. The ‘current’ field is for books I’m still reading. This is my token concession to the “I started this a month ago but haven’t finished yet” problem.

The next phase is to turn that into a more detailed YAML file that includes proper titles, Amazon links, cover URLs etc.

I have a small perl script to do that:

#!/usr/bin/perl
 
use strict;
use warnings;
 
use YAML;
use Net::Amazon ();
use Cache::File ();
 
my $yaml = YAML::LoadFile(shift || "reading-yaml.txt");
my @out = map expanded_data($_), @{ $yaml->{books} };
print Dump { books => \@out };
 
sub expanded_data {
  my $book = shift;
  my $property = get_book(sprintf "%010s", $book->{isbn});
  return {
    %$book,
    isbn  => sprintf( "%010s", $book->{isbn} ),
    title => $property->title,
    img   => $property->ImageUrlSmall,
    url   => $property->url,
  };
}
 
BEGIN {
  my %amzn_opt = (
      token        => "MY_AMAZON_KEY",
      affiliate_id => "tmtm-20",
      cache        => Cache::File->new(
        cache_root      => '/tmp/amzn_cache',
        cache_umask     => 000,
        default_expires => '30 day',
      ),
  );
  my $us = Net::Amazon->new(%amzn_opt);
  my $uk = Net::Amazon->new(%amzn_opt, locale => "uk");
 
  sub get_book {
    my $isbn = sprintf "%010s", shift;
    my $resp = $uk->search( asin => $isbn );
    $resp = $us->search( asin => $isbn ) unless $resp->is_success;
    die "Can't find $isbn" unless $resp->is_success;
    my ($property) = $resp->properties;
    return $property;
  }
}

It simply reads in my raw book file, uses Amazon Web Services to look up more data about the books, (storing the data in cache for 30 days to speed the whole thing up on later runs), and throws out a new YAML file with more fields. Amazon US has slightly more likelihood of having cover scans, so I check it first falling back on the UK if there’s no results there. I pick up a lot of my books in the US anyway, so it isn’t that much of an issue, although I occasionally a different cover from the one that I have.

Then I have a simple MT plugin, called mt-reading.pl which I drop straight into my MT/cgi-bin/plugins/ directory:

package MT::Plugin::ReadingList;
 
use lib '/usr/local/MT/cgi-bin/lib';
 
use MT::Template::Context;
use Data::BookList;
 
MT::Template::Context->add_container_tag(
  ReadingList => sub {
    ( my $ctx, $args ) = @_;
    my $builder = $ctx->stash('builder');
    my $tokens = $ctx->stash('tokens');
 
    my $yaml_src = $args->{src}
      or return $ctx->error("No YAML source file specified.");
 
    my $list = Data::BookList->new($yaml_src)
      or return $ctx->error("Invalid YAML source file");
 
    my $content = "";
    for my $book ( $list->reading_list($args) ) {
      $ctx->stash( book => $book );
      $content .= $builder->build( $ctx, $tokens );
    }
    return $content;
 
  }
);
 
MT::Template::Context->add_tag(
  ReadingListBook => sub {
    my $book = shift->stash('book');
    my $args = shift || {};
    $book->{cover} ||= sprintf qq{<a xhref="%s" mce_href="%s" ><img
      border="0" alt="%s" xsrc="%s" mce_src="%s" /></a>},
        $book->{url}, $book->{title}, $book->{img} || "";
    return exists $args->{display}
      ? $book->{ $args->{display} }
      : $book->{cover};
  }
);
 
1;

This simply adds two new tags ‘ReadingList’ and ‘ReadingListBook’ that I can add to my MT templates, and have them expanded at build time.

So, in my template I include something like this:

<p>Recent Reading</p>
<div class="book">
  <MTReadingList src="/path/to/reading.yaml" lastn="9">
    <$MTReadingListBook display="cover" $>
  </MTReadingList>
</div>

The only remaining piece is the Data::BookList module, which is a simple ‘load the data from YAML, and return whichever ones I want’:

package Data::BookList;
 
use strict;
use warnings;
 
use YAML;
 
sub new {
  my ($class, $src) = @_;
  my $books = YAML::LoadFile($src) or return;
  bless { _booklist => $books->{books}, }, $class;
}
 
sub reading_list {
  my ($self, $args) = @_;
  my @books = @{ shift->{_booklist} };
  if (exists $args->{current}) {
    @books = grep $_->{current}, @books;
  }
  if (exists $args->{lastn}) {
    @books =
      (sort { $b->{date} cmp $a->{date} } @books)[ 0 .. $args->{lastn} - 1 ];
  }
  return @books;
}
 
1;

This allows me to ask for only ‘current’ books and/or the ‘lastn’ books: currently 9 for my blog. I plan to add more features here later, but for now this does what I need.

In some ways this is all over-complicated if all I wanted was a ‘recent reading’ section on my blog. But I find the separation of concerns useful. Managing my raw data is distinct from fetching information about it, which is distinct from slicing that data up, which is distinct from presenting it on my blog. So, when I find an ontology for expressing all this in RDF I should really only to write a new presentation script.

Of course, in practice, the ontology will specify some fields that I don’t currently store, so I’ll probably need to also expand the amazon lookup code, and it’ll probably want me to do my dates differently, etc., but that’s the theory anyway!

Monthly Archives

January 6th, 2003 No comments

As I said a few days ago, I had planned to use the Month at a Glance calendar for my monthly archives.

But this proved much more difficult that expected. It should have been simple. Mark had already provided all the templates, the stylesheet, the images etc. But I couldn’t work out how to actually make MT know how to use different templates for the monthly archives that the daily ones. The templates menu only provides a single template for a “Date-Based Archive”.

I couldn’t find anything obvious in the documentation, so I asked Mark, and he pointed me to the Archiving section of the Configuration. Here you can choose which sort of template you use for each type of archive (Daily, Monthly, Weekly etc.) But, again I couldn’t easily see how to change the template file. Confusingly there is an input box for “Archive File Template” but that isn’t actually the template for the archive, but a way to specify what the filename for each of your archives should be (so that you can have 2003/01/01.html instead of 2003_01_01.html, for example).

I tried “Add new” from this menu, but again it only let me create yet another view of the month using the standard “Date-Based Archive”.

I eventually discovered that I had to go back to the Templates menu and add an entirely new type of Archive Template, which I called “Month at a Glance”. Then when I went back to the Archiving menu and tried “Add new” again, this time my new template was one of the choices.

Unfortunately it still didn’t work from there, as rebuilding the site had no effect. Because I now had 3 different monthly archives set up, MT didn’t seem to want to use my new one (even though it was the only one selected as being active). I had to delete the other monthly archives, and then everything seemed to work.

So I now have a nifty ‘browse’ link under my calendar that lets you step around month by month.