MT Amazon Reading List

I’ve been asked which plugin I’m using to generate my “reading list” over on the sidebar. Like any true geek, of course, I actually wrote my own. Of course I’m generally into reuse where possible, but I wanted to learn how to write MT plugins, and it seemed like a good place to start. It also helped that I didn’t like any of the 3rd party plugins out there for this. There are probably much better ones available now. Mine also isn’t very good, but it’s part of a bigger plan…

Over the last year or so I’ve gradually been drinking the semantic web kool-aid. I’m sure I’ll rant more about this later, but I don’t believe it’s going to happen the way most people have been pushing for it, but I’m now convinced that it’s going to happen.

Of course I’m part of the problem for making it happen, as I’m a data geek. I collect structured information. My friends laugh at the fact that I could run queries to tell you how much I’ve spent on milk in the last year, but I find the information useful. (Well maybe not that information, exactly, but the general principle of being able to analyze my spending…)

Unfortunately most of the people wanting to make the semantic web happen are also data geeks who believe in structured information, even though the vast majority of the world aren’t. This is a very big problem for traditional semweb thinking, but I no longer think it matters very much.

But, in the meantime, I want to do stuff with my structured information. Such as the list of books I’ve read.

The first problem was how to store them. I’m reasonably well known to be a database guy. I also have a simple framework for building simple web apps to manage databases, so I considered building one for managing my books. But that seemed like too much hassle for now – I really wanted to just edit a file when I started reading a new book.

Faced with this problem, most techies these days seem to instinctively reach for XML. Personally I can’t stand it. I really hate how verbose it is. Unfortunately a large part of the Semantic Web work is also based around XML. Theoretically you can express your RDF in other ways, but really almost everyone is using XML. This used to bother me as I thought I’d need to do this, but now I believe that the more obscure and arcane we can make this stuff the better, as then everyone will want tools to do it, and only masochists will end up doing it by hand.

So for my books I, instead, reached instinctively for YAML. I thought for a while about what information I’d want to store, before realising that I was much too lazy to want to type any information that could be found elsewhere. So my YAML file really just includes the ISBN of the book, and the rough date that I read it. Of course I don’t usually read a book in one day – I quite often read 4 or 5 books simultaneously over a period, just to get an interplay of ideas happening. And there are lots of books I start, read about half of, and don’t get round to finishing for months, or sometimes even years (if at all). I spent a while trying to find a sensible way to model that, before deciding it was all much to complex, and I’d be happy enough with just entering a rough date.

So I ended up with a very basic YAML file:

---
books:

   - isbn    : "0596007515"
     title   : "Ggl Hacks"
     date    : "2004-11-01"
     current : 1

   - isbn    : "0439977789"
     title   : "Ruby / Smoke"
     date    : "2004-11-01"

   - isbn    : "075093204X"
     title   : "Decline and Fall Everybody"
     date    : "2004-10-09"

The ‘title’ field is there just as a placeholder to aid human readability. It never actually gets used anywhere, so I can fill it with shorthand etc. The ‘current’ field is for books I’m still reading. This is my token concession to the “I started this a month ago but haven’t finished yet” problem.

The next phase is to turn that into a more detailed YAML file that includes proper titles, Amazon links, cover URLs etc.

I have a small perl script to do that:

#!/usr/bin/perl
 
use strict;
use warnings;
 
use YAML;
use Net::Amazon ();
use Cache::File ();
 
my $yaml = YAML::LoadFile(shift || "reading-yaml.txt");
my @out = map expanded_data($_), @{ $yaml->{books} };
print Dump { books => \@out };
 
sub expanded_data {
  my $book = shift;
  my $property = get_book(sprintf "%010s", $book->{isbn});
  return {
    %$book,
    isbn  => sprintf( "%010s", $book->{isbn} ),
    title => $property->title,
    img   => $property->ImageUrlSmall,
    url   => $property->url,
  };
}
 
BEGIN {
  my %amzn_opt = (
      token        => "MY_AMAZON_KEY",
      affiliate_id => "tmtm-20",
      cache        => Cache::File->new(
        cache_root      => '/tmp/amzn_cache',
        cache_umask     => 000,
        default_expires => '30 day',
      ),
  );
  my $us = Net::Amazon->new(%amzn_opt);
  my $uk = Net::Amazon->new(%amzn_opt, locale => "uk");
 
  sub get_book {
    my $isbn = sprintf "%010s", shift;
    my $resp = $uk->search( asin => $isbn );
    $resp = $us->search( asin => $isbn ) unless $resp->is_success;
    die "Can't find $isbn" unless $resp->is_success;
    my ($property) = $resp->properties;
    return $property;
  }
}

It simply reads in my raw book file, uses Amazon Web Services to look up more data about the books, (storing the data in cache for 30 days to speed the whole thing up on later runs), and throws out a new YAML file with more fields. Amazon US has slightly more likelihood of having cover scans, so I check it first falling back on the UK if there’s no results there. I pick up a lot of my books in the US anyway, so it isn’t that much of an issue, although I occasionally a different cover from the one that I have.

Then I have a simple MT plugin, called mt-reading.pl which I drop straight into my MT/cgi-bin/plugins/ directory:

package MT::Plugin::ReadingList;
 
use lib '/usr/local/MT/cgi-bin/lib';
 
use MT::Template::Context;
use Data::BookList;
 
MT::Template::Context->add_container_tag(
  ReadingList => sub {
    ( my $ctx, $args ) = @_;
    my $builder = $ctx->stash('builder');
    my $tokens = $ctx->stash('tokens');
 
    my $yaml_src = $args->{src}
      or return $ctx->error("No YAML source file specified.");
 
    my $list = Data::BookList->new($yaml_src)
      or return $ctx->error("Invalid YAML source file");
 
    my $content = "";
    for my $book ( $list->reading_list($args) ) {
      $ctx->stash( book => $book );
      $content .= $builder->build( $ctx, $tokens );
    }
    return $content;
 
  }
);
 
MT::Template::Context->add_tag(
  ReadingListBook => sub {
    my $book = shift->stash('book');
    my $args = shift || {};
    $book->{cover} ||= sprintf qq{<a xhref="%s" mce_href="%s" ><img
      border="0" alt="%s" xsrc="%s" mce_src="%s" /></a>},
        $book->{url}, $book->{title}, $book->{img} || "";
    return exists $args->{display}
      ? $book->{ $args->{display} }
      : $book->{cover};
  }
);
 
1;

This simply adds two new tags ‘ReadingList’ and ‘ReadingListBook’ that I can add to my MT templates, and have them expanded at build time.

So, in my template I include something like this:

<p>Recent Reading</p>
<div class="book">
  <MTReadingList src="/path/to/reading.yaml" lastn="9">
    <$MTReadingListBook display="cover" $>
  </MTReadingList>
</div>

The only remaining piece is the Data::BookList module, which is a simple ‘load the data from YAML, and return whichever ones I want’:

package Data::BookList;
 
use strict;
use warnings;
 
use YAML;
 
sub new {
  my ($class, $src) = @_;
  my $books = YAML::LoadFile($src) or return;
  bless { _booklist => $books->{books}, }, $class;
}
 
sub reading_list {
  my ($self, $args) = @_;
  my @books = @{ shift->{_booklist} };
  if (exists $args->{current}) {
    @books = grep $_->{current}, @books;
  }
  if (exists $args->{lastn}) {
    @books =
      (sort { $b->{date} cmp $a->{date} } @books)[ 0 .. $args->{lastn} - 1 ];
  }
  return @books;
}
 
1;

This allows me to ask for only ‘current’ books and/or the ‘lastn’ books: currently 9 for my blog. I plan to add more features here later, but for now this does what I need.

In some ways this is all over-complicated if all I wanted was a ‘recent reading’ section on my blog. But I find the separation of concerns useful. Managing my raw data is distinct from fetching information about it, which is distinct from slicing that data up, which is distinct from presenting it on my blog. So, when I find an ontology for expressing all this in RDF I should really only to write a new presentation script.

Of course, in practice, the ontology will specify some fields that I don’t currently store, so I’ll probably need to also expand the amazon lookup code, and it’ll probably want me to do my dates differently, etc., but that’s the theory anyway!

Understanding Nothing

Tony Bowden's ramblings

Leave a Reply