Archive

Posts Tagged ‘code’

Migrating Movable Type to WordPress

February 6th, 2006 Tony No comments

The server on which my weblog used to run is getting rather old and crumbly, and brings with it a constant low-level dread that some day, real soon now, it’s going to give up the ghost. So for the last while, we’ve gradually been moving everything off it. When it came time to move this site, I decided that it was also a good opportunity to migrate away from Movable Type, mostly for the same philosophical reasons that Mark Pilgrim has already set out.

I considered moving to Typo, following Piers‘ lead, mostly just so I could play with Ruby, but in the end I decided on WordPress. I’ve made the mistake too many times now of choosing software based on the language in which it’s written. Yes, it would be nicer to be able to hack on my weblog in Ruby or Perl than in PHP, but I know enough PHP to get by, and I doubt I’ll be doing that much hacking anyway.

Setting the weblog up was fairly trivial, as most good PHP installations tend to be. Migrating all my old content wasn’t quite so simple. WordPress 2 seems to have made the import process much simpler than before; most of the information on the process I’ve found relates to older versions and isn’t really applicable any more. Unfortunately, although the simple case of importing my MT archive was fairly painless, I really didn’t want to break all my old links.

There are a few sites that discuss how to maintain your Movable Type post IDs, but they all seem to relate to the old WordPress process. So I had to get my hands dirty in PHP much quicker than expected.

Firstly, I had to edit MT/App/CMS.pm in my MT setup, adding a line to include the entry id in the export output:

AUTHOR: <$MTEntryAuthor$>
TITLE: <$MTEntryTitle$>
ID: <$MTEntryID$>
STATUS: <$MTEntryStatus$>

Then I was able to export all my posts.

I had to post-process the output file, however, as I’ve been creating my posts using the MT Kwiki plugin. This meant that none of my links imported correctly. I spent much too long wrestling with vim’s non-greedy regular expressions before giving up and processing the data in Perl instead:

perl -pe 's/[(http:.*?) (.*?)]/$2/g' mt-dump.txt |
perl -pe 's/[(.*?) (http:.*?)]/$1/g' > deWikied.txt

Then I had to persuade WordPress to maintain the MT ids. In the old WordPress import script it just inserted the posts by hand, and it was a simple matter of ‘fixing’ the SQL it used to do this. But now the importer calls the same code that is used when you create a post through the normal interface.

So I needed to add a check for the ID into the import/mt.php script:

case 'AUTHOR' :
    $post_author = $value;
    break;
case 'ID' :
    $post_ID = $value;
    break;

And then fix the call that inserts the data:

$postdata = compact('post_ID','post_author', 'post_date', 'post_date_gmt', ... );

Then I needed to adjust the wp_insert_post() call to cope with an incoming post_ID:

  if ( !isset($post_ID) )
      $post_ID = 0;
  if ( !isset($post_password) )
      $post_password = '';

and adjust its SQL accordingly

"INSERT IGNORE INTO $wpdb->posts (id, post_author, post_date, ... ) VALUES
  ($post_ID, '$post_author', '$post_date', ...)");

(The arguments are passed as an extract() of a get_object_vars(), so there’s no need to change any of the other handling).

I believe that this is a safe enough approach that won’t interfere with creating new posts or editing old ones, but you can always revert this file back after importing if there are any problems.

With this in place, I was able to import all my old posts. (There were a lot of them, so I actually had to split the file and import 4 segments in turn). The other thing that the docs don’t make clear is that you need to have an upload directory which is writable by your webserver, but that was easy enough to work out from the error message.

They all came in with the same IDs as they used to have, so then it was just a matter of setting up some Apache redirects on the old server:

Redirect permanent /nothing/index.rdf

http://nothing.tmtm.com/feed/

RedirectMatch permanent /nothing/archives/([0-9]{6}).html

http://nothing.tmtm.com/archives/$1

RedirectMatch permanent /nothing/archives/([0-9]{4})_([0-9]{2}).html

http://nothing.tmtm.com/archives/date/$1/$2

RedirectMatch permanent /nothing/archives/([0-9]{4})_([0-9]{2})_([0-9]{2}).html

http://nothing.tmtm.com/archives/date/$1/$2/$3

RedirectMatch permanent /nothing/archives/([0-9]{4})_([0-9]{2})_([0-9]{2}).html

http://nothing.tmtm.com/archives/date/$1/$2/$3

(I’ve already changed my permalink structure in WordPress to have this style of URL)

There will be many more things to change later to replicate the changes I’d made to my MT set-up, but this at least gets me up and running on WordPress.

The Joys of CSV

November 12th, 2004 Tony No comments

I’ve been working with CSV files a lot recently, mostly as a way of building web based management information tools out of SAGE data.

But I’ve always really hated working with the interface to Text::CSV_XS. So I put together Text::CSV::Simple. You just point it at the file you want, and read out all the rows:

my $parser = Text::CSV::Simple->new;
my @data = $parser-&gt;read_file($datafile);

You can tell it you only want certain fields:

$parser->want_fields(1, 2, 4, 8 );

And that you want the results straight into a hashref rather than just a listref:

$parser->field_map(qw/id name null town/);

There are also trigger points where you can pre- and post-process the data.

It’s certainly made dealing with CSV much easier for me. And it seems to be useful for other people too, as within a few weeks of its release I’ve had several feature requests and bug reports. Usually it takes a couple of months for a new module of mine to build up enough steam to get that.

However, I’ve now had several people all report a problem that I didn’t even consider before: it doesn’t handle newlines in strings. This disturbed me as I hadn’t realised until this that CSV files could actually contain embedded newlines! Of course, I can’t find any sensible documentation anywhere of what the CSV file format actually does and doesn’t allow, as it seems that Microsoft just made it a defacto standard by making it the main export format from Excel, without ever really specifying how it can be used. The few sites that I found that claim to provide more details on the format are contradictory (e.g. over the issue of header rows).

But it certainly does seem that linebreaks are acceptable, as long as they’re properly quoted. This shoots my whole approach to parsing the files apart, and means I’m going to have to go back and pretty much rewrite the module from scratch, and I may even have to lose one of my trigger points, as I still want to use Text::CSV_XS to do the actual parsing for me, but I’ll need to hook in at a different level now.

Of course I face my normal Open Source dilemma with this. The code clearly has a bug, but it’s not one that has any effect on me. None of the CSV files I have to deal with have linebreaks inside records. If the code wasn’t released, I’d apply my XP YAGNI principles, and defer the fix until I needed it. In some ways I’d like to be able to tell people who reported the bug that I’ll happily accept a patch if they can fix it, but otherwise they’ll have to wait until I need it. But having public code out there with known bugs irks me, so I guess I’ll just have to find the time from somewhere to fix it myself!

MT Amazon Reading List

November 11th, 2004 Tony No comments

I’ve been asked which plugin I’m using to generate my “reading list” over on the sidebar. Like any true geek, of course, I actually wrote my own. Of course I’m generally into reuse where possible, but I wanted to learn how to write MT plugins, and it seemed like a good place to start. It also helped that I didn’t like any of the 3rd party plugins out there for this. There are probably much better ones available now. Mine also isn’t very good, but it’s part of a bigger plan…

Over the last year or so I’ve gradually been drinking the semantic web kool-aid. I’m sure I’ll rant more about this later, but I don’t believe it’s going to happen the way most people have been pushing for it, but I’m now convinced that it’s going to happen.

Of course I’m part of the problem for making it happen, as I’m a data geek. I collect structured information. My friends laugh at the fact that I could run queries to tell you how much I’ve spent on milk in the last year, but I find the information useful. (Well maybe not that information, exactly, but the general principle of being able to analyze my spending…)

Unfortunately most of the people wanting to make the semantic web happen are also data geeks who believe in structured information, even though the vast majority of the world aren’t. This is a very big problem for traditional semweb thinking, but I no longer think it matters very much.

But, in the meantime, I want to do stuff with my structured information. Such as the list of books I’ve read.

The first problem was how to store them. I’m reasonably well known to be a database guy. I also have a simple framework for building simple web apps to manage databases, so I considered building one for managing my books. But that seemed like too much hassle for now – I really wanted to just edit a file when I started reading a new book.

Faced with this problem, most techies these days seem to instinctively reach for XML. Personally I can’t stand it. I really hate how verbose it is. Unfortunately a large part of the Semantic Web work is also based around XML. Theoretically you can express your RDF in other ways, but really almost everyone is using XML. This used to bother me as I thought I’d need to do this, but now I believe that the more obscure and arcane we can make this stuff the better, as then everyone will want tools to do it, and only masochists will end up doing it by hand.

So for my books I, instead, reached instinctively for YAML. I thought for a while about what information I’d want to store, before realising that I was much too lazy to want to type any information that could be found elsewhere. So my YAML file really just includes the ISBN of the book, and the rough date that I read it. Of course I don’t usually read a book in one day – I quite often read 4 or 5 books simultaneously over a period, just to get an interplay of ideas happening. And there are lots of books I start, read about half of, and don’t get round to finishing for months, or sometimes even years (if at all). I spent a while trying to find a sensible way to model that, before deciding it was all much to complex, and I’d be happy enough with just entering a rough date.

So I ended up with a very basic YAML file:

---
books:
 
   - isbn    : "0596007515"
     title   : "Ggl Hacks"
     date    : "2004-11-01"
     current : 1
 
   - isbn    : "0439977789"
     title   : "Ruby / Smoke"
     date    : "2004-11-01"
 
   - isbn    : "075093204X"
     title   : "Decline and Fall Everybody"
     date    : "2004-10-09"

The ‘title’ field is there just as a placeholder to aid human readability. It never actually gets used anywhere, so I can fill it with shorthand etc. The ‘current’ field is for books I’m still reading. This is my token concession to the “I started this a month ago but haven’t finished yet” problem.

The next phase is to turn that into a more detailed YAML file that includes proper titles, Amazon links, cover URLs etc.

I have a small perl script to do that:

#!/usr/bin/perl
 
use strict;
use warnings;
 
use YAML;
use Net::Amazon ();
use Cache::File ();
 
my $yaml = YAML::LoadFile(shift || "reading-yaml.txt");
my @out = map expanded_data($_), @{ $yaml->{books} };
print Dump { books => \@out };
 
sub expanded_data {
  my $book = shift;
  my $property = get_book(sprintf "%010s", $book->{isbn});
  return {
    %$book,
    isbn  => sprintf( "%010s", $book->{isbn} ),
    title => $property->title,
    img   => $property->ImageUrlSmall,
    url   => $property->url,
  };
}
 
BEGIN {
  my %amzn_opt = (
      token        => "MY_AMAZON_KEY",
      affiliate_id => "tmtm-20",
      cache        => Cache::File->new(
        cache_root      => '/tmp/amzn_cache',
        cache_umask     => 000,
        default_expires => '30 day',
      ),
  );
  my $us = Net::Amazon->new(%amzn_opt);
  my $uk = Net::Amazon->new(%amzn_opt, locale => "uk");
 
  sub get_book {
    my $isbn = sprintf "%010s", shift;
    my $resp = $uk->search( asin => $isbn );
    $resp = $us->search( asin => $isbn ) unless $resp->is_success;
    die "Can't find $isbn" unless $resp->is_success;
    my ($property) = $resp->properties;
    return $property;
  }
}

It simply reads in my raw book file, uses Amazon Web Services to look up more data about the books, (storing the data in cache for 30 days to speed the whole thing up on later runs), and throws out a new YAML file with more fields. Amazon US has slightly more likelihood of having cover scans, so I check it first falling back on the UK if there’s no results there. I pick up a lot of my books in the US anyway, so it isn’t that much of an issue, although I occasionally a different cover from the one that I have.

Then I have a simple MT plugin, called mt-reading.pl which I drop straight into my MT/cgi-bin/plugins/ directory:

package MT::Plugin::ReadingList;
 
use lib '/usr/local/MT/cgi-bin/lib';
 
use MT::Template::Context;
use Data::BookList;
 
MT::Template::Context->add_container_tag(
  ReadingList => sub {
    ( my $ctx, $args ) = @_;
    my $builder = $ctx->stash('builder');
    my $tokens = $ctx->stash('tokens');
 
    my $yaml_src = $args->{src}
      or return $ctx->error("No YAML source file specified.");
 
    my $list = Data::BookList->new($yaml_src)
      or return $ctx->error("Invalid YAML source file");
 
    my $content = "";
    for my $book ( $list->reading_list($args) ) {
      $ctx->stash( book => $book );
      $content .= $builder->build( $ctx, $tokens );
    }
    return $content;
 
  }
);
 
MT::Template::Context->add_tag(
  ReadingListBook => sub {
    my $book = shift->stash('book');
    my $args = shift || {};
    $book->{cover} ||= sprintf qq{<a xhref="%s" mce_href="%s" ><img
      border="0" alt="%s" xsrc="%s" mce_src="%s" /></a>},
        $book->{url}, $book->{title}, $book->{img} || "";
    return exists $args->{display}
      ? $book->{ $args->{display} }
      : $book->{cover};
  }
);
 
1;

This simply adds two new tags ‘ReadingList’ and ‘ReadingListBook’ that I can add to my MT templates, and have them expanded at build time.

So, in my template I include something like this:

<p>Recent Reading</p>
<div class="book">
  <MTReadingList src="/path/to/reading.yaml" lastn="9">
    <$MTReadingListBook display="cover" $>
  </MTReadingList>
</div>

The only remaining piece is the Data::BookList module, which is a simple ‘load the data from YAML, and return whichever ones I want’:

package Data::BookList;
 
use strict;
use warnings;
 
use YAML;
 
sub new {
  my ($class, $src) = @_;
  my $books = YAML::LoadFile($src) or return;
  bless { _booklist => $books->{books}, }, $class;
}
 
sub reading_list {
  my ($self, $args) = @_;
  my @books = @{ shift->{_booklist} };
  if (exists $args->{current}) {
    @books = grep $_->{current}, @books;
  }
  if (exists $args->{lastn}) {
    @books =
      (sort { $b->{date} cmp $a->{date} } @books)[ 0 .. $args->{lastn} - 1 ];
  }
  return @books;
}
 
1;

This allows me to ask for only ‘current’ books and/or the ‘lastn’ books: currently 9 for my blog. I plan to add more features here later, but for now this does what I need.

In some ways this is all over-complicated if all I wanted was a ‘recent reading’ section on my blog. But I find the separation of concerns useful. Managing my raw data is distinct from fetching information about it, which is distinct from slicing that data up, which is distinct from presenting it on my blog. So, when I find an ontology for expressing all this in RDF I should really only to write a new presentation script.

Of course, in practice, the ontology will specify some fields that I don’t currently store, so I’ll probably need to also expand the amazon lookup code, and it’ll probably want me to do my dates differently, etc., but that’s the theory anyway!

::Simple

May 1st, 2004 Tony No comments

As of this morning there are just under 250 modules on CPAN matching ‘::Simple’. I take a certain amount of blame for this. I’ve released a couple myself, and my kitchen has a credit as the birthplace of Test::Simple.

But I like to think that in those cases the module really does deserve the ‘simple’ moniker. Perl’s spreadsheet modules are notoriously complex, and there’s no need to jump through all the hoops of two-dimensional cell access and data vs. formatting if all you want to do is read or write each row as an array. Similarly, Test::Simple has one trivial test function that is all a beginning test-writer needs to get into the way of writing tests, and there’s a clear migration path up to Test::More.

But somewhere along the line ::Simple seems to have mutated into ::I::Don’t::Like::The::Normal::Syntax. Take DBD::mysql::SimpleMySQL. Ignoring the fact that it’s in completely the wrong namespace as it’s not a driver, I’m at a complete loss to see how it’s “simpler” than, well, anything really.

Now, I’m obviously biased, but even the example in the docs makes my brain hurt:

my $select = ['Passwd.*', 'UsrGrp.UsrGrpName'];
my $from = ['Passwd'];
my $joins = [];
push @{$joins}, join_struct("PasswdHostGrp", "Passwd.PasswdID", "PasswdHostGrp.PasswdID");
push @{$joins}, join_struct("UsrGrp", "Passwd.PrimaryGroupID", "UsrGrp.UsrGrpID");
my $wheres = "PasswdHostGrp.HostGrpID IN ('group1', 'group2')";
my $arrayref = dbselect_array($dbh, build_select($select, $from, $joins, $wheres, 0))

I can understand trying to make it slightly easier to build SQL dynamically, or trying of provide a way to abstract and package up SQL patterns, but I really don’t understand programmers’ fascination with trying to completely reinvent SQL. The basics are really quite straightforward, especially at the level that these modules are able to ‘hide’ from you. And it’s going to be much simpler and more powerful when you eventually need to do something considerably more complex. There’s really no substitute for learning how to use SQL if you’re going to work with databases.

In fact I’d say that trying to reimplement SQL is a move in exactly the wrong direction. Especially in a dynamic and flexible language like Perl, it’s good practice to invent little mini-languages for the domain in which you’re working. But, of course when working with databases there’s no need to invent the language – it already exists!

Somehow I doubt this will stop the proliferation of modules trying to do away with SQL. Soon we’ll probably have as many database abstraction libraries as we do templating systems…

Tags: ,

Object Oriented JavaScript

January 8th, 2003 Tony No comments

I can see lots of similarities between JavaScript and Perl. Both are languages that are often written by people with no real programming experience, just to get a job done – usually involving web sites. A lot of the code in each isn’t written from scratch, but starts by taking some other code that almost does what you want, and hacking it around until it does do what you want. And, as a result, a lot of the code that exists in both languages is really ugly, clumsy, and contains lots of special case code and lots of subtle bugs. Which the next person to adapt the script hacks around until it does what they want. And so on.

But beneath it all, both are actually very powerful languages, which can be well written, clean, expressive, and well factored.

And whilst I’m perfectly at home writing Perl like that, my JavaScript skills are still rather lacking.

So, I was playing around with Object Oriented JavaScript over the holidays. I found a good example at ChunkySoup, but I still wasn’t entirely happy with the results.

The code for the test page is still a little uglier, and more repetitive than I’d like. In particular, each link on the page still needs to explicitly set up its
own handling code for the image rollovers etc.:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<div id="link1"><a href="DSCB1428.jpg"
onmouseover="elements[0].handleMouseOver()"
onmouseout="elements[0].handleMouseOut()" onclick="return
elements[0].handleClick()">1</a></div>
<div id="link2"><a href="DSCN4337.jpg"
onmouseover="elements[1].handleMouseOver()"
onmouseout="elements[1].handleMouseOut()" onclick="return
elements[1].handleClick()">2</a></div>
 
<div id="link3"><a href="DSCN4358.jpg"
onmouseover="elements[2].handleMouseOver()"
onmouseout="elements[2].handleMouseOut()" onclick="return
elements[2].handleClick()">3</a></div>
<div id="link4"><a href="DSCN4373.jpg"
onmouseover="elements[3].handleMouseOver()"
onmouseout="elements[3].handleMouseOut()" onclick="return
elements[3].handleClick()">4</a></div>
<div id="link5"><a href="DSCN1509.jpg"
onmouseover="elements[4].handleMouseOver()"
onmouseout="elements[4].handleMouseOut()" onclick="return
elements[4].handleClick()">5</a></div>

And setting up the JavaScript that gets called onLoad is a little repititive too:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
var elements = new Array();
var thumbnailID = "thumbnail"; // this is universal for the page
var emptyimg = "blank.gif"; // this is universal for the page
var photoID = "bigimage"; // this is universal for the page
 
function initpage() {
  elements[0] = new csnPhotoNavObject(new
csnPhotoObject(thumbnailID,emptyimg,"DSCN1428tn.jpg",photoID,"DSCN1428.jpg"));
  elements[1] = new csnPhotoNavObject(new
csnPhotoObject(thumbnailID,emptyimg,"DSCN4337tn.jpg",photoID,"DSCN4337.jpg"));
  elements[2] = new csnPhotoNavObject(new
csnPhotoObject(thumbnailID,emptyimg,"DSCN4358tn.jpg",photoID,"DSCN4358.jpg"));
  elements[3] = new csnPhotoNavObject(new
csnPhotoObject(thumbnailID,emptyimg,"DSCN4373tn.jpg",photoID,"DSCN4373.jpg"));
  elements[4] = new csnPhotoNavObject(new
csnPhotoObject(thumbnailID,emptyimg,"DSCN1509tn.jpg",photoID,"DSCN1509.jpg"));
}

So, I figured it should be possible to abstract some of that away further too. The JavaScript should be able to dynamically alter the DOM and set up all the event handles. Then the links could just be set up as:

1
2
3
4
5
<a id="link1" href="DSCN1428.jpg">1</a>
<a id="link2" href="DSCN4337.jpg">2</a>
<a id="link3" href="DSCN4358.jpg">3</a>
<a id="link4" href="DSCN4373.jpg">4</a>
<a id="link5" href="DSCN1509.jpg">5</a>

And at the start of the page, I’d just want to associate images with each link:

1
2
3
4
5
6
7
function initpage() {
  addImage("link1", "DSCN1428.jpg", "DSCN1428tn.jpg");
  addImage("link2", "DSCN4337.jpg", "DSCN4337tn.jpg");
  addImage("link3", "DSCN4358.jpg", "DSCN4358tn.jpg");
  addImage("link4", "DSCN4373.jpg", "DSCN4373tn.jpg");
  addImage("link5", "DSCN1509.jpg", "DSCN1509tn.jpg");
}

The addImage JavaScript would then find the image element with the id of the first parameter, associate an onclick() with the second element, and a rollover() with the third.

After a lot of playing around I ended up with a nice abstract JavaScript addImage function that does just this:

1
2
3
4
5
6
7
function addImage(id, img, thumb) {
  var pno = new csnPhotoNavObject(new csnPhotoObject("thumbnail","blank.gif",thumb,"bigimage",img));
  var img = document.getElementById(id);
  img.onmouseover = function() { pno.handleMouseOver(); };
  img.onmouseout  = function() { pno.handleMouseOut(); };
  img.onclick     = function() { return pno.handleClick(); };
}

What I don’t like about this though, is the need to set up the anonymous closures. (Of course, before this I didn’t even know I could actually do that in JavaScript!). I can’t see why I can’t just say:

4
5
  img.onmouseover = pno.handleMouseOver;
  img.onmouseout  = pno.handleMouseOut;

The onmouseover and onmouseout need to be assigned a function. But if I give them the foreign object function directly, then something later gets confused. Whereas if I give it an anonymous function that just calls that other function, everything works just fine.

I don’t know if I’m doing something wrong. Or if there’s some strange JavaScript language issues I don’t know about yet. Or what.

Anyone any suggestions?

Tags: ,

Camel Poop

January 5th, 2003 Tony No comments

In response to the Camel POOP article on Evolt, Simon Willison complains: I have no
intention of starting a language war, but my God this is ugly. Still, I guess it must work for some people.

As a user and fan of Perl, I have to agree. This is exceptionally ugly. In fact it’s the sort of thing that turns people against Perl. Thankfully OO in Perl doesn’t have to be like that.

Let’s take his Person class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#class Person
package Person;
use strict;
use Address;    #Person class will contain an Address
 
#constructor
sub new {
  my ($class) = @_;
  my $self = {
    _firstName => undef,
    _lastName  => undef,
    _ssn       => undef,
    _address   => undef
  };
  bless $self, $class;
  return $self;
}
 
#accessor method for Person first name
sub firstName {
  my ( $self, $firstName ) = @_;
  $self->{_firstName} = $firstName if defined($firstName);
  return $self->{_firstName};
}
 
#accessor method for Person last name
sub lastName {
  my ( $self, $lastName ) = @_;
  $self->{_lastName} = $lastName if defined($lastName);
  return $self->{_lastName};
}
 
#accessor method for Person address
sub address {
  my ( $self, $address ) = @_;
  $self->{_address} = $address if defined($address);
  return $self->{_address};
}
 
#accessor method for Person social security number
sub ssn {
  my ( $self, $ssn ) = @_;
  $self->{_ssn} = $ssn if defined($ssn);
  return $self->{_ssn};
}
 
sub print {
  my ($self) = @_;
 
  #print Person info
  printf("Name:%s %snn", $self->firstName, $self->lastName );
}
 
1;

There are numerous problems with this. Let’s start from the top.

  1. The ‘use Address’ is completely needless. Misleading comment notwithstanding, nothing in the package actually uses the Address module, so there’s no need to load it.
  2. The constructor is complete overkill. The object is going to be a hash, but in Perl hash keys autovivify the first time you assign to them, so there’s absolutely no need to set up lots of keys that contain undef. As the constructor does nothing, it could simply be sub new { bless {}, shift }.
  3. The data methods all do exactly the same thing. This breaks one of the cardinal rules of programming – Don’t Repeat Yourself. firstName, lastName, address, and ssn are all trivial accessor/mutator methods. They could all be abstracted away in a variety of methods, but as this is Perl, someone else has already done that for us. Class::Accessor lets us set up all these methods simply by doing:
    Person->mk_accessors(qw/firstName lastName address ssn/);
  4. The constructor doesn’t all you to set object data. It’s pretty much a matter of style, but in general it’s nice to be able to instantiate your object with data, rather than having to call all the mutators in turn. As this is a simple hash-based object (as with 90% of all perl objects) Class::Accessor gives us a default new() as well that allows us to pass a hashref of the data members.

So, this entire class could be replaced with:

1
2
3
4
5
6
7
8
9
10
11
12
13
package Person;
 
use strict;
use base 'Class::Accessor';
 
Person->mk_accessors(qw/firstName lastName address ssn/)
 
sub print {
  my ($self) = @_;
  printf "Name:%s %snn", $self->firstName, $self->lastName;
}
 
1;

Similarly, the Employee subclass, which simply adds ‘id’ and ‘title’ data members, and overrides print could become:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
package Employee;
 
use strict;
use base 'Person';
 
Employee->mk_accessors(qw/id title/);
 
sub print {
  my ($self) = @_;
  $self->SUPER::print;
  printf("Name:%s %snn", $self->id, $self->title );
}
 
1;

(The example on the site makes no sense as it keeps referring to an Address class that doesn’t seem to exist. I’ve assumed that the overridden print should be outputting the extra data members of this class instead…)

Then, instead of the longwinded test program (with a eval to catch exceptions that’s completely needless as neither class throws them):

1
2
3
4
5
6
7
8
9
10
11
12
13
use Employee;
 
#create Employee class instance
my $khurt =  eval { new Employee(); }  or die ($@);
 
#set object attributes
$khurt->firstName('Khurt');
$khurt->lastName('Williams');
$khurt->id(1001);
$khurt->title('Executive Director');
 
#diplay Employee info
$khurt->print();

We can simply have:

1
2
3
4
5
6
7
8
9
10
use Employee;
 
my $khurt = Employee->new({
  firstName => 'Khurt',
  lastName  => 'Williams',
  id        => 1001,
  title     => 'Executive Director',
});
 
$khurt->print();

It may not be as nice as OO in some other languages, but it’s a lot nicer than the example in the article… and as it’s Perl, there’s plenty of other approaches if you want to dice it another way.

Tags: ,

Importing Radio posts to MT

January 5th, 2003 Tony No comments

A couple of people have asked for more pointers on exactly how I got my Radio posts into MT. So here’s some more detailed information.

Firstly, I ran into lots of problems with Radio seemingly caching macros. When I edit radio macro files and resave them, Radio doesn’t seem to notice the changes for about 5 minutes. Which is far ideal for the programming approach I use (particularly in a language I’m not that familiar with, such as UserTalk), which basically involves keeping the code running at all times (code a line, save it, test it still works). I would never survived as a programmer in the old days of coding ‘offline’.

I originally thought this was to do with my set up (I have my PC’s C: drive mounted via samba onto the linux box on which I do most of my programming, as I find it much easier to code in that environment), but using Notepad didn’t seem to make any difference, and there are a few scattered references to this problem littering the Userland noticeboards.

Anyway, I ended up having to code in the scratchpad instead. The script is below. It was cut-n-pasted from Radio’s outline editor so the formatting is a little strange, but it should do the trick. This can either be used as a macro, or, as I did, or by adding it to the scratchpad and calling <% workspace.showAll ()%> in a page (I just created a new page in my Radio Userland/www/ directory that had that in it.

This outputs all my posts in XML, with the titles and bodies wrapped in CDATA tags. I then wrote a simple Perl script to turn this into the MT input format. You then import this into MT following the instructions in the manual.

I’m sure this could all be tidied up some more, but hopefully it’s of some use to someone as is!

Read more…

The Great Language Shootout

October 2nd, 2002 Tony 1 comment

I was talking to Marty again recently about his anti-Java stance, and we were trying to think of ways in which different languages could be rated.

This of course reminded me to go back and check out Doug Bagley’s Great Language Shootout, which compares multiple languages’ speed and memory usage for doing the same task.

I’d previously helped optimise some of the Perl solutions, and last night I noticed that there were a few new tests from when I last looked at it. In partiuclar the results for the Matrix Multiplication test seemed slightly strange: Perl was much further down the list that I’d have expected.

The meat of the code seemed to be the function mmult:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
sub mmult {
    my ($rows, $cols, $m1, $m2) = @_;
    my @m3 = ();
    --$rows; --$cols;
    for my $i (0 .. $rows) {
        my @row = ();
        my $m1i = $m1->[$i];
        for my $j (0 .. $cols) {
            my $val = 0;
            for my $k (0 .. $cols) {
                $val += $m1i->[$k] * $m2->[$k]->[$j];
            }
            push(@row, $val);
        }
        push(@m3, \@row);
    }
    return(\@m3);
}

I played around for a while, and got about a 50% speedup with quite a nasty nested map approach:

1
2
3
4
5
6
7
8
9
10
11
12
13
sub mmult2 {
  my ($rows, $cols, $m1, $m2) = @_;
  --$rows; --$cols;
  my @m3 = ();
  for (0 .. $rows) {
    my $i = $_;
    push @m3, [ map {
      my $j = $_;
      sum map $m1->[$i]->[$_] * $m2->[$_]->[$j], 0..$cols;
    } 0 .. $cols ]
  }
  return \@m3;
}

I tried various approaches to turn the outer for() into a map as well, but my brain started hurting too much as it all got very messy.

And then I noticed that we were pushing to @m3 each time around a loop that counted from 0, and realised it would probably be much more efficient to just assign directly each time. So I replaced the push @m3, ... with $m3[$i] = ... , and performance shot up.

So I rolled back all the other changes I’d made, and just applied this straight through:

1
2
3
4
5
6
7
8
9
10
11
sub mmult3 {
  my ($rows, $cols, $m1, $m2) = @_;
  my $m3 = [];
  --$rows; --$cols;
  for my $i (0 .. $rows) {
    for my $j (0 .. $cols) {
      $m3->[$i][$j] += $m1->[$i][$_] * $m2->[$_][$j] for 0..$cols;
    }
  }
  return $m3;
}

I think this version is much neater, more idiomatic Perl, and also more understandable and maintainable than not just my optimised one, but the original as well. And it’s 3 times faster.

Optimising for speed doesn’t always mean trading off maintainability. Usually finding a better approach gets better results that micro-optimisations, and can end up producing an all-round better solution, not just a faster one.

Unfortunately Doug has stopped updating the Shootout pages, so perl will just have to languish 4 places lower on this test than it should be …

Tags: ,

Lightweight XSLT with TT

April 16th, 2002 Tony No comments

I discovered a wonderful Template Toolkit plugin yesterday: XML::Style.

The basic idea, according to the docs, is that you can apply various attributes to your HTML. The example given is of transforming an HTML table:

           [% USE xmlstyle
                  table = {
                      attributes = {
                          border      = 0
                          cellpadding = 4
                          cellspacing = 1
                      }
                  }
           %]
 
           [% FILTER xmlstyle %]
 
           <table>
           <tr>
             <td>Foo</td> <td>Bar</td> <td>Baz</td>
           </tr>
           </table>
 
           [% END %]

This didn’t sit quite right with me though, as that seemed to be something you should be doing in CSS. But as I read through the docs I discovered you can also change tags. Again though the docs gave a bad example:

           [% FILTER xmlstyle
                     th = {
                         element = 'td'
                         attributes = { bgcolor='red' }
                     }
           %]
           <tr>
             <th>Heading</th>
           </tr>
           <tr>
             <td>Value</td>
           </tr>
           [% END %]

Having been playing with XSLT recently, though, a lightbulb went off. The real power of this plugin is more to be able to do things like:

   [% USE xmlstyle
        video = {
          pre_start = '<html><head><title="Video Info"></head><body>'
          element = 'table'
          attributes = { class='videoTable' },
          post_end  = '</body></html>'
        }
 
        title = {
          pre_start = '<tr><td>Title:</td>'
          element    = 'td'
          attributes = { class='videoTitle' }
          post_end  = '</tr>'
        }
 
        price = {
          pre_start = '<tr><td>Price:</td>'
          element    = 'td'
          attributes = { class='videoPrice' }
          post_end  = '</tr>'
        }
   %]

And then, given some XML such as:

    <video>
      <title>La Double Vie De Veronique</title>
      <price>10.99</price>
    </video>

We end up with:

    <html><head><title="Video Info"></head><body><table class="videoTable">
      <tr><td>Title:</td><td class="videoTitle">La Double Vie De Veronique</td></tr>
      <tr><td>Price:</td><td class="videoPrice">10.99</td></tr>
    </table></body></html>

This could be used as a first step towards “true” XSLT if you’re already using TT. If the only reason you’re moving towards XSLT is because a PHB says to, it might even be enough to convince them that you’ve done so :)

Tags: ,