Archive

Posts Tagged ‘Movable Type’

Migrating Movable Type to WordPress

February 6th, 2006 No comments

The server on which my weblog used to run is getting rather old and crumbly, and brings with it a constant low-level dread that some day, real soon now, it’s going to give up the ghost. So for the last while, we’ve gradually been moving everything off it. When it came time to move this site, I decided that it was also a good opportunity to migrate away from Movable Type, mostly for the same philosophical reasons that Mark Pilgrim has already set out.

I considered moving to Typo, following Piers‘ lead, mostly just so I could play with Ruby, but in the end I decided on WordPress. I’ve made the mistake too many times now of choosing software based on the language in which it’s written. Yes, it would be nicer to be able to hack on my weblog in Ruby or Perl than in PHP, but I know enough PHP to get by, and I doubt I’ll be doing that much hacking anyway.

Setting the weblog up was fairly trivial, as most good PHP installations tend to be. Migrating all my old content wasn’t quite so simple. WordPress 2 seems to have made the import process much simpler than before; most of the information on the process I’ve found relates to older versions and isn’t really applicable any more. Unfortunately, although the simple case of importing my MT archive was fairly painless, I really didn’t want to break all my old links.

There are a few sites that discuss how to maintain your Movable Type post IDs, but they all seem to relate to the old WordPress process. So I had to get my hands dirty in PHP much quicker than expected.

Firstly, I had to edit MT/App/CMS.pm in my MT setup, adding a line to include the entry id in the export output:

AUTHOR: <$MTEntryAuthor$>
TITLE: <$MTEntryTitle$>
ID: <$MTEntryID$>
STATUS: <$MTEntryStatus$>

Then I was able to export all my posts.

I had to post-process the output file, however, as I’ve been creating my posts using the MT Kwiki plugin. This meant that none of my links imported correctly. I spent much too long wrestling with vim’s non-greedy regular expressions before giving up and processing the data in Perl instead:

perl -pe 's/[(http:.*?) (.*?)]/$2/g' mt-dump.txt |
perl -pe 's/[(.*?) (http:.*?)]/$1/g' > deWikied.txt

Then I had to persuade WordPress to maintain the MT ids. In the old WordPress import script it just inserted the posts by hand, and it was a simple matter of ‘fixing’ the SQL it used to do this. But now the importer calls the same code that is used when you create a post through the normal interface.

So I needed to add a check for the ID into the import/mt.php script:

case 'AUTHOR' :
    $post_author = $value;
    break;
case 'ID' :
    $post_ID = $value;
    break;

And then fix the call that inserts the data:

$postdata = compact('post_ID','post_author', 'post_date', 'post_date_gmt', ... );

Then I needed to adjust the wp_insert_post() call to cope with an incoming post_ID:

  if ( !isset($post_ID) )
      $post_ID = 0;
  if ( !isset($post_password) )
      $post_password = '';

and adjust its SQL accordingly

"INSERT IGNORE INTO $wpdb->posts (id, post_author, post_date, ... ) VALUES
  ($post_ID, '$post_author', '$post_date', ...)");

(The arguments are passed as an extract() of a get_object_vars(), so there’s no need to change any of the other handling).

I believe that this is a safe enough approach that won’t interfere with creating new posts or editing old ones, but you can always revert this file back after importing if there are any problems.

With this in place, I was able to import all my old posts. (There were a lot of them, so I actually had to split the file and import 4 segments in turn). The other thing that the docs don’t make clear is that you need to have an upload directory which is writable by your webserver, but that was easy enough to work out from the error message.

They all came in with the same IDs as they used to have, so then it was just a matter of setting up some Apache redirects on the old server:

Redirect permanent /nothing/index.rdf

http://nothing.tmtm.com/feed/

RedirectMatch permanent /nothing/archives/([0-9]{6}).html

http://nothing.tmtm.com/archives/$1

RedirectMatch permanent /nothing/archives/([0-9]{4})_([0-9]{2}).html

http://nothing.tmtm.com/archives/date/$1/$2

RedirectMatch permanent /nothing/archives/([0-9]{4})_([0-9]{2})_([0-9]{2}).html

http://nothing.tmtm.com/archives/date/$1/$2/$3

RedirectMatch permanent /nothing/archives/([0-9]{4})_([0-9]{2})_([0-9]{2}).html

http://nothing.tmtm.com/archives/date/$1/$2/$3

(I’ve already changed my permalink structure in WordPress to have this style of URL)

There will be many more things to change later to replicate the changes I’d made to my MT set-up, but this at least gets me up and running on WordPress.

MT Amazon Reading List

November 11th, 2004 No comments

I’ve been asked which plugin I’m using to generate my “reading list” over on the sidebar. Like any true geek, of course, I actually wrote my own. Of course I’m generally into reuse where possible, but I wanted to learn how to write MT plugins, and it seemed like a good place to start. It also helped that I didn’t like any of the 3rd party plugins out there for this. There are probably much better ones available now. Mine also isn’t very good, but it’s part of a bigger plan…

Over the last year or so I’ve gradually been drinking the semantic web kool-aid. I’m sure I’ll rant more about this later, but I don’t believe it’s going to happen the way most people have been pushing for it, but I’m now convinced that it’s going to happen.

Of course I’m part of the problem for making it happen, as I’m a data geek. I collect structured information. My friends laugh at the fact that I could run queries to tell you how much I’ve spent on milk in the last year, but I find the information useful. (Well maybe not that information, exactly, but the general principle of being able to analyze my spending…)

Unfortunately most of the people wanting to make the semantic web happen are also data geeks who believe in structured information, even though the vast majority of the world aren’t. This is a very big problem for traditional semweb thinking, but I no longer think it matters very much.

But, in the meantime, I want to do stuff with my structured information. Such as the list of books I’ve read.

The first problem was how to store them. I’m reasonably well known to be a database guy. I also have a simple framework for building simple web apps to manage databases, so I considered building one for managing my books. But that seemed like too much hassle for now – I really wanted to just edit a file when I started reading a new book.

Faced with this problem, most techies these days seem to instinctively reach for XML. Personally I can’t stand it. I really hate how verbose it is. Unfortunately a large part of the Semantic Web work is also based around XML. Theoretically you can express your RDF in other ways, but really almost everyone is using XML. This used to bother me as I thought I’d need to do this, but now I believe that the more obscure and arcane we can make this stuff the better, as then everyone will want tools to do it, and only masochists will end up doing it by hand.

So for my books I, instead, reached instinctively for YAML. I thought for a while about what information I’d want to store, before realising that I was much too lazy to want to type any information that could be found elsewhere. So my YAML file really just includes the ISBN of the book, and the rough date that I read it. Of course I don’t usually read a book in one day – I quite often read 4 or 5 books simultaneously over a period, just to get an interplay of ideas happening. And there are lots of books I start, read about half of, and don’t get round to finishing for months, or sometimes even years (if at all). I spent a while trying to find a sensible way to model that, before deciding it was all much to complex, and I’d be happy enough with just entering a rough date.

So I ended up with a very basic YAML file:

---
books:
 
   - isbn    : "0596007515"
     title   : "Ggl Hacks"
     date    : "2004-11-01"
     current : 1
 
   - isbn    : "0439977789"
     title   : "Ruby / Smoke"
     date    : "2004-11-01"
 
   - isbn    : "075093204X"
     title   : "Decline and Fall Everybody"
     date    : "2004-10-09"

The ‘title’ field is there just as a placeholder to aid human readability. It never actually gets used anywhere, so I can fill it with shorthand etc. The ‘current’ field is for books I’m still reading. This is my token concession to the “I started this a month ago but haven’t finished yet” problem.

The next phase is to turn that into a more detailed YAML file that includes proper titles, Amazon links, cover URLs etc.

I have a small perl script to do that:

#!/usr/bin/perl
 
use strict;
use warnings;
 
use YAML;
use Net::Amazon ();
use Cache::File ();
 
my $yaml = YAML::LoadFile(shift || "reading-yaml.txt");
my @out = map expanded_data($_), @{ $yaml->{books} };
print Dump { books => \@out };
 
sub expanded_data {
  my $book = shift;
  my $property = get_book(sprintf "%010s", $book->{isbn});
  return {
    %$book,
    isbn  => sprintf( "%010s", $book->{isbn} ),
    title => $property->title,
    img   => $property->ImageUrlSmall,
    url   => $property->url,
  };
}
 
BEGIN {
  my %amzn_opt = (
      token        => "MY_AMAZON_KEY",
      affiliate_id => "tmtm-20",
      cache        => Cache::File->new(
        cache_root      => '/tmp/amzn_cache',
        cache_umask     => 000,
        default_expires => '30 day',
      ),
  );
  my $us = Net::Amazon->new(%amzn_opt);
  my $uk = Net::Amazon->new(%amzn_opt, locale => "uk");
 
  sub get_book {
    my $isbn = sprintf "%010s", shift;
    my $resp = $uk->search( asin => $isbn );
    $resp = $us->search( asin => $isbn ) unless $resp->is_success;
    die "Can't find $isbn" unless $resp->is_success;
    my ($property) = $resp->properties;
    return $property;
  }
}

It simply reads in my raw book file, uses Amazon Web Services to look up more data about the books, (storing the data in cache for 30 days to speed the whole thing up on later runs), and throws out a new YAML file with more fields. Amazon US has slightly more likelihood of having cover scans, so I check it first falling back on the UK if there’s no results there. I pick up a lot of my books in the US anyway, so it isn’t that much of an issue, although I occasionally a different cover from the one that I have.

Then I have a simple MT plugin, called mt-reading.pl which I drop straight into my MT/cgi-bin/plugins/ directory:

package MT::Plugin::ReadingList;
 
use lib '/usr/local/MT/cgi-bin/lib';
 
use MT::Template::Context;
use Data::BookList;
 
MT::Template::Context->add_container_tag(
  ReadingList => sub {
    ( my $ctx, $args ) = @_;
    my $builder = $ctx->stash('builder');
    my $tokens = $ctx->stash('tokens');
 
    my $yaml_src = $args->{src}
      or return $ctx->error("No YAML source file specified.");
 
    my $list = Data::BookList->new($yaml_src)
      or return $ctx->error("Invalid YAML source file");
 
    my $content = "";
    for my $book ( $list->reading_list($args) ) {
      $ctx->stash( book => $book );
      $content .= $builder->build( $ctx, $tokens );
    }
    return $content;
 
  }
);
 
MT::Template::Context->add_tag(
  ReadingListBook => sub {
    my $book = shift->stash('book');
    my $args = shift || {};
    $book->{cover} ||= sprintf qq{<a xhref="%s" mce_href="%s" ><img
      border="0" alt="%s" xsrc="%s" mce_src="%s" /></a>},
        $book->{url}, $book->{title}, $book->{img} || "";
    return exists $args->{display}
      ? $book->{ $args->{display} }
      : $book->{cover};
  }
);
 
1;

This simply adds two new tags ‘ReadingList’ and ‘ReadingListBook’ that I can add to my MT templates, and have them expanded at build time.

So, in my template I include something like this:

<p>Recent Reading</p>
<div class="book">
  <MTReadingList src="/path/to/reading.yaml" lastn="9">
    <$MTReadingListBook display="cover" $>
  </MTReadingList>
</div>

The only remaining piece is the Data::BookList module, which is a simple ‘load the data from YAML, and return whichever ones I want’:

package Data::BookList;
 
use strict;
use warnings;
 
use YAML;
 
sub new {
  my ($class, $src) = @_;
  my $books = YAML::LoadFile($src) or return;
  bless { _booklist => $books->{books}, }, $class;
}
 
sub reading_list {
  my ($self, $args) = @_;
  my @books = @{ shift->{_booklist} };
  if (exists $args->{current}) {
    @books = grep $_->{current}, @books;
  }
  if (exists $args->{lastn}) {
    @books =
      (sort { $b->{date} cmp $a->{date} } @books)[ 0 .. $args->{lastn} - 1 ];
  }
  return @books;
}
 
1;

This allows me to ask for only ‘current’ books and/or the ‘lastn’ books: currently 9 for my blog. I plan to add more features here later, but for now this does what I need.

In some ways this is all over-complicated if all I wanted was a ‘recent reading’ section on my blog. But I find the separation of concerns useful. Managing my raw data is distinct from fetching information about it, which is distinct from slicing that data up, which is distinct from presenting it on my blog. So, when I find an ontology for expressing all this in RDF I should really only to write a new presentation script.

Of course, in practice, the ontology will specify some fields that I don’t currently store, so I’ll probably need to also expand the amazon lookup code, and it’ll probably want me to do my dates differently, etc., but that’s the theory anyway!

Bizarre Links and Bloglines

November 10th, 2004 No comments

Smylers has just pointed out to me that, for those of you reading this via Bloglines, all my URLs are screwy. Instead of saying things like http://www.kasei.com/, they’re just saying //www.kasei.com.

And although Mozilla, IE, and SharpReader Do The Right Thing here, Bloglines doesn’t. Of course, it’s hard to know what The Right Thing actually is here. When I said DTRT above, I really meant Do What I Mean (which of course is always The Right Thing!)

Fixing this problem turns out to be non-trivial. I’m using Marty’s MT Kwiki Plugin, but it doesn’t really do anything except run the text through CGI::Kwiki. A little investigation on our internal wiki reveals that the problem lies somewhere in CGI::Kwiki itself, as the same problem shows up there too.

But CGI::Kwiki has been supplanted by Kwiki, and I’m not sure there’s a simple upgrade path…

Tags:

Tidying Up

March 21st, 2003 No comments

So Steve creates a bookmarklet that lets me see that my default MT RSS feed is bad.

Then Beowulf points me to Cynthia, who tells me that my alt = “RSS” tag on my RSS image is too short, as alt tags are supposed to be between 7 and 81 characters long, my “powered by MT” logo doesn’t have an alt tag at all, and my permalinks are bad as they all display the same text but link to different places.

I’ve fixed the others, but what are you meant to do about permalinks?

Tags:

Monthly Archives

January 6th, 2003 No comments

As I said a few days ago, I had planned to use the Month at a Glance calendar for my monthly archives.

But this proved much more difficult that expected. It should have been simple. Mark had already provided all the templates, the stylesheet, the images etc. But I couldn’t work out how to actually make MT know how to use different templates for the monthly archives that the daily ones. The templates menu only provides a single template for a “Date-Based Archive”.

I couldn’t find anything obvious in the documentation, so I asked Mark, and he pointed me to the Archiving section of the Configuration. Here you can choose which sort of template you use for each type of archive (Daily, Monthly, Weekly etc.) But, again I couldn’t easily see how to change the template file. Confusingly there is an input box for “Archive File Template” but that isn’t actually the template for the archive, but a way to specify what the filename for each of your archives should be (so that you can have 2003/01/01.html instead of 2003_01_01.html, for example).

I tried “Add new” from this menu, but again it only let me create yet another view of the month using the standard “Date-Based Archive”.

I eventually discovered that I had to go back to the Templates menu and add an entirely new type of Archive Template, which I called “Month at a Glance”. Then when I went back to the Archiving menu and tried “Add new” again, this time my new template was one of the choices.

Unfortunately it still didn’t work from there, as rebuilding the site had no effect. Because I now had 3 different monthly archives set up, MT didn’t seem to want to use my new one (even though it was the only one selected as being active). I had to delete the other monthly archives, and then everything seemed to work.

So I now have a nifty ‘browse’ link under my calendar that lets you step around month by month.

Importing Radio posts to MT

January 5th, 2003 No comments

A couple of people have asked for more pointers on exactly how I got my Radio posts into MT. So here’s some more detailed information.

Firstly, I ran into lots of problems with Radio seemingly caching macros. When I edit radio macro files and resave them, Radio doesn’t seem to notice the changes for about 5 minutes. Which is far ideal for the programming approach I use (particularly in a language I’m not that familiar with, such as UserTalk), which basically involves keeping the code running at all times (code a line, save it, test it still works). I would never survived as a programmer in the old days of coding ‘offline’.

I originally thought this was to do with my set up (I have my PC’s C: drive mounted via samba onto the linux box on which I do most of my programming, as I find it much easier to code in that environment), but using Notepad didn’t seem to make any difference, and there are a few scattered references to this problem littering the Userland noticeboards.

Anyway, I ended up having to code in the scratchpad instead. The script is below. It was cut-n-pasted from Radio’s outline editor so the formatting is a little strange, but it should do the trick. This can either be used as a macro, or, as I did, or by adding it to the scratchpad and calling <% workspace.showAll ()%> in a page (I just created a new page in my Radio Userland/www/ directory that had that in it.

This outputs all my posts in XML, with the titles and bodies wrapped in CDATA tags. I then wrote a simple Perl script to turn this into the MT input format. You then import this into MT following the instructions in the manual.

I’m sure this could all be tidied up some more, but hopefully it’s of some use to someone as is!

Read more…

Migrating from Radio to Movable Type

January 4th, 2003 No comments

I finally managed to move to MT from Radio. As much as I’ve enjoyed learning all the ins and outs of Radio, I’m just finding it too painful. Whilst the memory leak which slowly kills my computer (unless I restart Radio each time I want to do something and then shut it down again straightaway) is merely irritating, the fact that I can’t post when I’m not at my desktop machine is becoming a major problem.

In particular, I’m heading off to Boston for a week soon and I want to be able to post whilst I’m there. (Anyone who reads this who’s going to be in Boston in the next few weeks and would like to meet up, please drop me a line at tony@tmtm.com)

Actually setting up MT was trivial, but I’ve spent a few days trying to minimise the disruption to readers. I’ve set Apache up to redirect most requests to the relevant page on the new site, but I’ve almost certainly missed some.

There were all sorts of interesting issues with the migration:

1) Export / Import

I wrote a radio macro to export all my posts as XML, and a Perl script that then turned that file into MT’s input format. This was easier than trying to get Radio to export to that format, as I couldn’t find an easy way to output only the information I wanted on a Radio page. It also gave me the chance to make a few other changes in Perl, where I’m much more comfortable.

2) Post Titles

In Radio you can have untitled posts. MT gives those titles – some of which are very strange …

3) Title Links

In Radio I often used the ability to add a link to a post’s title. I couldn’t find a sensible way to import this into MT. I’ll gradually work my way through my historic posts to fix this.

4) Auto-formatting

Both Radio and MT attempt to auto-format your posts. But they do it in subtly different ways. I had to turn off ‘convert line breaks’ in the imported posts by performing an update on the database. This makes most posts look a little strange (mainly because they lose the paragraph breaks). Again, I’ll tidy those up over time.

5) Template and Stylesheets

I’d heavily customised my Radio templates and stylesheets. I had to do the same with MT. It took me a while to work out where some of the templates actually were with MT (particularly the seach template which by default applies across the entire MT installation, rather than per blog, and thus can’t be edited in the same way as all the others…), and then a lot longer to modify them all. Again, I’ve almost certainly missed some.

6) Blogroll / Linkback / Google-links

The blogroll and link-back were done via xbit-hack includes, and so could be brought in trivially. However, MT clobbers the file permissions every time it rebuilds the page, making the SSI no longer work. For now I have to remember to chmod it by hand. The google-links used to be created automatically as I wrote my posts and stored in the Radio object database. I still haven’t investigated how to replicate this in MT.

7) Posts on Home Page

With Radio I could say show the last 7 days of posts on the front page. In MT I can say the same. However, with Radio it shows the last 7 days on which there were posts, whether they were actually the last 7 days in real time. In MT it shows the last physical week. So I’ve had to tell it to show the last 10 posts instead.

8) Navigation

I set MT up to have daily archives, but still haven’t found any useful way to navigate around those beyond the back-a-day / forward-a-day links on each page. I quite liked the ‘Month of Posts at a Glance’ calendar that was on Dive Into Mark at some point, so I’ll probably try to find that and use it.

If there’s anything else strange about the new version, please let me know!

HTTP Conditional Get

October 22nd, 2002 No comments

My “last updated” blogroll on the left hand side of the page is now much friendlier than before. I’ve made a fairly trivial changed my script so that to now use LWP::Simple‘s mirror() function, rather that it’s get() one. This uses the If-Modified-Since header to only fetch the body of people’s RSS feeds if they’ve changed. It should not only cut down on bandwidth usage, but also makes it much faster to update my blogroll, which was becoming quite slow, as this is all running on a fairly old machine.

Using Movable Type’s TrackBack with Radio Userland

July 15th, 2002 No comments

Jeremy refers back to my response to his response to the OnLamp article, berating the fact that Radio doesn’t work with Movable Type’s TrackBack feature.

Well, I did a bit of digging and discovered David Watson’s post on how to make this work.

I tried it out, by inserting the following into the previous post:

<% scratchpad.s = tcp.httpClient (server:"jeremy.zawodny.com", path:"/mt/mt-tb.cgi?tb_id=43&url=http://www.tmtm.com/insanity/2002/07/14.html", ctFollowRedirects:"5"); string.httpResultSplit (scratchpad.s) %>

It seems to work, but seems to add multiple ‘pings’ to Jeremy’s trackback page. I’ll turn this into a radio macro once I discover how to work out what the URL of the current post will be, once created…