Archive

Posts Tagged ‘Wordpress’

Splitting a WordPress blog in two

May 13th, 2009 No comments

This blog had its seventh birthday recently. I know there are many amongst you who have been blogging since before the term was even coined, and who make more posts in a month than I’ve made in seven years, but still.

Anyway, back in the early days of blogging, a significant percentage of blog posts weren’t original content, but the equivalent of retweeting: a way of passing on to your readers something interesting you’d read elsewhere. Of course the vast majority of those were links to other people’s blogs. It’s how word spread about interesting posts before digg and reddit and twitter and the like.

I tried to do something slightly different for a while: rather than just regurgitating other blog posts, I instead regurgitated interesting snippets from real dead tree books I was reading, picking interesting excerpts chapter by chapter.

It seemed to be well received, and I had a lot of fun choosing which couple of paragraphs from each chapter could convey something interesting enough to both stand alone without the surrounding context and also encourage others to seek out the book for more depth.

Early in 2004, I seem to have abandoned the idea. Likely it’s just because I was super-busy with Twingle, and then with Unite, and I probably always meant to get around to picking it up again, but just never did. Until now.

I decided, however, to do this on a new separate blog: dustyvolumes.com. So I had to work out how to move all the old posts to there. This was significantly more complicated than I expected. Doubtless someone will point me to a WordPress plugin that could have made the whole thing take 30 seconds, but in the absence of that, here’s the gory details for anyone else who ever wants to do something like this.

First, of course, I needed to have the new blog set up. I’m assuming that’s self-evident, and needs no further explanation.

Next I needed to find all the posts I wanted to move. I already had them all tagged with “Books”, so this part was fairly easy and avoided an even longer manual process. WordPress doesn’t have an ‘export by tag/category’ option, though—the only way to restrict an export is by author. So I had to go into “Posts > Edit”, find a post with the relevant tag, and click that tag to give me a list of all those posts. Then I could do a Bulk Edit of each to change the author to a new temporary account I set up just for this purpose. There were multiple pages of them, and there doesn’t seem to be a way to operate on more than one page at a time, so I went through them page by page. It was repetitive enough to make me want to find a short-cut, but there weren’t quite enough pages to make it worthwhile.

Then I exported all the posts by my new author, and imported those into the new blog. I did some more tidying up there of tags and categories etc, and found a few posts that should probably still remain on this blog instead (they were tagged with Books too, but were, for example, about me getting rid of my collection before moving to Estonia, rather than being excerpts suitable for Dusty Volumes), so deleted them from there, and changed the author here back to me on each of them in turn (I wanted that author to match exactly the posts that were on the other blog so I could continue to operate on those here).

Now I had the new blog working, but hit the much harder problem of what to do about the posts here. I could, of course, just have deleted the posts that I’d moved, but I still get quite a few hits on them from Google searches and links from other blogs, as well as some internal links to them, and I didn’t want to break all those. After some research I found a couple of WordPress plugins for setting up redirection. The first one I tried, “Redirection“, has lots and lots of features, but wasn’t quite what I wanted. The second, “Redirect“, was perfect. It does only one thing, but does it simply, and does it well. Using the Custom Field options in WordPress, it lets you set a ‘Redirect’ field with a value of the URL that viewers should be redirected to on viewing a given post. So now it was just a matter of going through and setting those up one by one.

Thankfully the WordPress import maintains the post ID from the export, so I didn’t need to spend any time building a map of which IDs should map where: each relevant post would just need to redirect to http://dustyvolumes.com/archives/<id>. I did a couple of these manually to make sure everything was working, but there was no way I wanted to do another 150 or so by hand. It was time to go to the database.

I’ve never actually explored the WordPress schema before, but there aren’t very many tables, and it’s fairly easy to work out what’s going on. (There’s probably decent documentation for it all too, but I tend to prefer to just work things like this out manually.) I’m not going to detail all the SQL commands I had to run: if you don’t know enough to work them out yourself you probably shouldn’t be playing with the database directly anyway, and should just do this the longwinded way (and I really don’t want to be fielding questions on it 6 months from now when the schema has changed). But it was a simple matter to just select the IDs of all posts by my fake ‘author’, and insert the relevant Redirect custom field values.

However, this still left a large number of ‘Books’ entries in my tagcloud that really weren’t there any more, so I also wanted to remove all the tags from these posts too. Ideally the Bulk Edit should be capable of this, but it currently only allows you to add a tag to multiple posts, not remove one, so again I went to the database. This one was slightly trickier, as it’s a cross-table DELETE, but again, if don’t know how to do that, you shouldn’t just be pasting in random SQL you found on someone’s blog somewhere.

Unfortunately, although that successfully removed all the tags, the tag cloud still proudly declared that I had a huge number of “Books” posts. WordPress, presumably for speed, keeps a total of how many posts are assigned to each category in a different table, and, being a typical modern webapp, maintains that count in client code rather than in the database itself. So having manually removed lots of tags without updating the count field too, my database was now out of sync with itself. MySQL doesn’t do cross-table UPDATEs with aggregates, so this time I needed an UPDATE with a subselect of a COUNT(*).

Including lots of cautious exploratory SELECTs, lots of LIMITs of my UPDATEs and DELETEs to make sure the right thing was happening each time, and backing up carefully after each major change, the whole thing took about an hour. I could possibly have done it all via the web interface in that time, but it would have been a close call, and there was a very high chance that I’d have gotten so bored in the middle of it that I’d have abandoned it half-way through, promising to finish it another day (and likely never quite gotten around to it). This way was mentally stimulating rather than draining, thus giving much more satisfaction when done, and I learned much more about the WordPress database structure that could be very useful if I ever decide to write a Plugin.

And now I have two blogs to rarely write in…

Categories

February 6th, 2006 No comments

When using Movable Type I was rather lax about assigning categories to posts, mostly because they weren’t really that useful, as I didn’t use any of the navigation features related to them.

By default in WordPress they seem to be much more prominent. Rather than trying to find a way to minimise that, I’m going to try to train myself to categorise everything better. I’ve spent a couple of hours working my way back through the last two years’ posts, adding them all categories, and rearranging my categories to better fit what I have actually written about regularly.

I still have a ways to go (about 200 posts to categorise), so I’m going to set myself a target of cleaning up 10 of them a day. That way I should have it all done in a few weeks.

It would be much quicker if adding a category to an uncategorised post automatically removed it from “uncategorised” rather than needing that to be done explicitly (and forgetting about 10% of the time)…

Tags:

Migrating Movable Type to WordPress

February 6th, 2006 No comments

The server on which my weblog used to run is getting rather old and crumbly, and brings with it a constant low-level dread that some day, real soon now, it’s going to give up the ghost. So for the last while, we’ve gradually been moving everything off it. When it came time to move this site, I decided that it was also a good opportunity to migrate away from Movable Type, mostly for the same philosophical reasons that Mark Pilgrim has already set out.

I considered moving to Typo, following Piers‘ lead, mostly just so I could play with Ruby, but in the end I decided on WordPress. I’ve made the mistake too many times now of choosing software based on the language in which it’s written. Yes, it would be nicer to be able to hack on my weblog in Ruby or Perl than in PHP, but I know enough PHP to get by, and I doubt I’ll be doing that much hacking anyway.

Setting the weblog up was fairly trivial, as most good PHP installations tend to be. Migrating all my old content wasn’t quite so simple. WordPress 2 seems to have made the import process much simpler than before; most of the information on the process I’ve found relates to older versions and isn’t really applicable any more. Unfortunately, although the simple case of importing my MT archive was fairly painless, I really didn’t want to break all my old links.

There are a few sites that discuss how to maintain your Movable Type post IDs, but they all seem to relate to the old WordPress process. So I had to get my hands dirty in PHP much quicker than expected.

Firstly, I had to edit MT/App/CMS.pm in my MT setup, adding a line to include the entry id in the export output:

AUTHOR: <$MTEntryAuthor$>
TITLE: <$MTEntryTitle$>
ID: <$MTEntryID$>
STATUS: <$MTEntryStatus$>

Then I was able to export all my posts.

I had to post-process the output file, however, as I’ve been creating my posts using the MT Kwiki plugin. This meant that none of my links imported correctly. I spent much too long wrestling with vim’s non-greedy regular expressions before giving up and processing the data in Perl instead:

perl -pe 's/[(http:.*?) (.*?)]/$2/g' mt-dump.txt |
perl -pe 's/[(.*?) (http:.*?)]/$1/g' > deWikied.txt

Then I had to persuade WordPress to maintain the MT ids. In the old WordPress import script it just inserted the posts by hand, and it was a simple matter of ‘fixing’ the SQL it used to do this. But now the importer calls the same code that is used when you create a post through the normal interface.

So I needed to add a check for the ID into the import/mt.php script:

case 'AUTHOR' :
    $post_author = $value;
    break;
case 'ID' :
    $post_ID = $value;
    break;

And then fix the call that inserts the data:

$postdata = compact('post_ID','post_author', 'post_date', 'post_date_gmt', ... );

Then I needed to adjust the wp_insert_post() call to cope with an incoming post_ID:

  if ( !isset($post_ID) )
      $post_ID = 0;
  if ( !isset($post_password) )
      $post_password = '';

and adjust its SQL accordingly

"INSERT IGNORE INTO $wpdb->posts (id, post_author, post_date, ... ) VALUES
  ($post_ID, '$post_author', '$post_date', ...)");

(The arguments are passed as an extract() of a get_object_vars(), so there’s no need to change any of the other handling).

I believe that this is a safe enough approach that won’t interfere with creating new posts or editing old ones, but you can always revert this file back after importing if there are any problems.

With this in place, I was able to import all my old posts. (There were a lot of them, so I actually had to split the file and import 4 segments in turn). The other thing that the docs don’t make clear is that you need to have an upload directory which is writable by your webserver, but that was easy enough to work out from the error message.

They all came in with the same IDs as they used to have, so then it was just a matter of setting up some Apache redirects on the old server:

Redirect permanent /nothing/index.rdf

http://nothing.tmtm.com/feed/

RedirectMatch permanent /nothing/archives/([0-9]{6}).html

http://nothing.tmtm.com/archives/$1

RedirectMatch permanent /nothing/archives/([0-9]{4})_([0-9]{2}).html

http://nothing.tmtm.com/archives/date/$1/$2

RedirectMatch permanent /nothing/archives/([0-9]{4})_([0-9]{2})_([0-9]{2}).html

http://nothing.tmtm.com/archives/date/$1/$2/$3

RedirectMatch permanent /nothing/archives/([0-9]{4})_([0-9]{2})_([0-9]{2}).html

http://nothing.tmtm.com/archives/date/$1/$2/$3

(I’ve already changed my permalink structure in WordPress to have this style of URL)

There will be many more things to change later to replicate the changes I’d made to my MT set-up, but this at least gets me up and running on WordPress.