Goodbye _why

August 20th, 2009

_why the lucky stiff has vanished. He was one of Ruby’s most curious characters, always treading carefully that thin line between the eccentric and the surreal. His “Poignant Guide to Ruby” was one of the best books about programming I’ve ever read, and certainly the only to have an accompanying soundtrack album. But now he’s gone. His websites, code repositories, twitter stream, etc. have all been deleted.

And, inevitably, some people are moaning about how terrible this is. Not just because the future will be that bit dimmer without him, but instead because of how unprofessional it is that he just vanished without warning and orphaned all his code, and how much work they’re going to have to do now to replace it all in their projects that depended on it. Somehow the actions of the developer have tainted the existing code so much that it’s now toxic to even use.

This is a perennial issue in the FOSS world, particularly where it intersects with the business sphere. Companies fear they’ll be at the mercy of developers they don’t, and can’t, control, and so some Open Source evangelists assuage their clients’ fears with a lie. Sometimes this is explicit: by promising a wealth of free (as in beer) software, created by great developers all around the world who’ll gladly work with your developers, answer all your questions, and make all the changes you require. More often it’s a lie of omission, touting some of the benefits, without mentioning the possible negatives. What they should be saying is that there is absolutely no guarantee that they people who write this code are not cranky, illogical, unprofessional, hostile, rude, crazy, or even just plain nasty. Many, if not most of them may not be, but from a purely commercial risk management position, you’re better assuming that they are.

Only once you truly accept this can you start to see the real benefits and opportunities. FOSS is not like commercial software, other than with a price tag of zero. Beyond a few high-profile exceptions there is no support available for most projects. You may have a helpful author, or a mailing list, wiki, or message-board filled with knowledgeable people who’ll give timely, friendly, useful advice for free. Or, you may not.

Pretending you always will is foolish. Assuming you always should is dangerous.

FOSS does not make this promise. Instead it makes a better one. It provides you the source code so that you can find a programmer anywhere in the world to fix your problem (or do so yourself if so inclined and skilled). And, better still, it uses a license that gives you the freedom to actually do this.

One of the many freedoms of FOSS is the freedom from vendor dependency. Like all freedoms, it has a price. But perpetuating the myth of a volunteer army of slaves ready to serve your every whim is just a fantasy and like all illusions, leads only to two possible outcomes: disillusionment or insanity.

I’ve never met _why and can’t claim to have known him. But he certainly never matched any of the caricatures of a FOSS developer — whether good or bad. He always trod his own path. He offered some great software to the world, simply as a gift. Some people assumed that that gift came with implied promises. They were wrong. That doesn’t detract from the software, and it most certainly doesn’t detract from _why. I hope he enjoys whatever crazy path he chooses next.

Tony

Get Excited and Make Stuff

June 29th, 2009

Last weekend I braved a visit to the UK for Social Innovation Camp Scotland. My remit had been to be a roving expert, flitting from team to team, but I was so impressed by the first team I started working with that I ended up staying with them the whole weekend — right through to the point where we won!

Yesterday the News of the WorldSunday Times ran a rather hilarious scare-mongering piece about it. They quote Calum Steele, general secretary of the Scottish Police Federation, as saying: “the police service already have ways for the public to express dissatisfaction”.

The point that this so magnificently misses, however, is that those ways aren’t good enough. Local councils already have ways for the public to report potholes, graffiti, broken streetlights etc. — yet FixMyStreet flourishes. All public authorities already have ways for the public to make Freedom of Information requests — yet in just over a year WhatDoTheyKnow has already grown to handle about 10% of all such requests. The National Health Service already have ways for the public to provide feedback — yet over 7,000 people have preferred to use Patient Opinion (who have also managed to pull off the neat trick of getting the NHS to pay them to deliver complaints to them.)

There are many reasons why someone would prefer to use these sorts of sites rather than going directly. For some it’s purely practical: in many cases it’s much easier to visit a single easy-to-use site with a consistent interface rather than navigate the more, erm, interesting, waters of official sites, some of which still sport “Beware of the Leopard” signs.

For others it’s the communal nature of these sites, where others who have experienced the same problems can chip in with support and advice, or even just learn that they’re not alone.

But for me it’s all about how transparency reverses the balance of power. For too long too many government agencies have forgotten that they are meant to be our servants, not our masters. Before they will engage with us, they make us jump through hoops that do nothing but frustrate us, in the guise of making their lives somehow easier (though usually anyone with any inkling of business processes can’t help but wonder how it possibly ever could). And more often than not, complaints get the stonewall or runaround treatment, and those who persist often get little more than a bland not-quite-apology with no indication that anyone ever took the time to engage with the matter, and certainly no sign that anything might actually change as a result.

The simple act of moving all this out into the open changes things dramatically. Everyone knows that “what gets measured gets done”, and, in the UK at least, government bodies tend to be rather sensitive to what the public at large think of them. As such, rubbish that has been left in an alleyway for weeks has a habit of suddenly being collected rather quickly when there’s a public report of it on FixMyStreet for anyone browsing that Council’s page to view. Agencies tend to be less inclined to take 6 months to respond to Freedom of Information requests when anyone looking at their WhatDoTheyKnow page could see at a glance that they never meet the required timescales. (We’ve heard, for example, that the Information Commissioner’s Office love WDTK as now they get see all manner of patterns and common problems that are missed when only dealing with complaints that get escalated to them.)

Transparency is powerful, as the UK has learned dramatically over the last couple of months. And once it’s in place, it’s extremely difficult to remove it. A central proposition of the Open Source movement has been that “given enough eyeballs, all bugs are shallow“. I wish I were witty and wise enough to come up with an equivalent for Open Government (suggestions welcome!), but even without a catch-phrase the underlying idea still holds. Government in the open will, more often than not, be better government. In some parts of the world, this is a concept that still needs to be fought for. In the rest, where there’s at least a token agreement, even if (or perhaps especially if) it’s more honour’d in the breach than the observance, then join in. Create your own site. Shine some more sunlight. You don’t need permission. You’re already in charge. Just Do It.

Tony , ,

What is a spreadsheet-wiki?

June 3rd, 2009

While I’m on the subject of products I really want to see, I would be remiss of me not to mention the spreadsheet-wiki. This one should already exist by now, and I hold myself largely responsible for it not — after all, I spent almost a year working with Dan Bricklin and Socialtext trying to make it happen. When we parted ways, I hoped to be able to continue the project, but for a variety of reasons that never came together either. There have, from time to time, been vaguely encouraging noises from Socialtext, but this still doesn’t seem like a high priority for them, and the information that leaks out from time to time implies they’re still going down a different path. I’ve deliberately held back from talking about some of this stuff to give them a chance to get something out, but it’s 18 months now since I left, any inside knowledge I had is long past its sell-by date, and I really want to see this come together from somewhere.

By far the most common response when I tried to explain to people what I was working on, and what a spreadsheet-wiki actually meant, was “Oh, you mean like Google Spreadsheets?” But Google, and their online spreadsheet rivals, aren’t really creating what I want. Google Spreadsheets is no more a spreadsheet wiki than Google Docs is a text wiki. Yes, they’re great for collaboration, but that’s only half the wiki story. The critical other ingredient on a wiki is the humble link. Even outside wiki-land the power of the hyperlink is still poorly understood and massively underrated. It’s the fundamental building block of the Web, but even still hasn’t lived up to anywhere near its potential. Almost everyone, when they talk or write about Wikipedia, focuses on the “Anyone Can Edit!” part (whether with awe or despair), but the vast majority of readers never edit anything—the key for them is that absolutely everything is a link:


My dream is that that could also be true for numbers.

Wikipedia, of course, is full of numbers. People can talk about them, change them, cross-reference them, and do all manner of wiki goodness with them. But that’s not enough. Those numbers currently live in splendid isolation. They can’t interact in a spreadsheety way.

There have been various attempts to fix this, generally involving embedding spreadsheets into wiki pages as a replacement for plain tables. But although that achieves the goal of being able to perform some basic calculations in-place, it’s no better than being able to embed an Excel sheet in a Word document. It doesn’t solve any of the well known problems with large spreadsheets (aka Spreadsheet Hell). In a spreadsheet-wiki the spreadsheet should not be a second class citizen, subservient to the wiki. Rather, the spreadsheet should itself contain wikiness. Forget simple single sheet spreadsheets; I’m talking here about hundreds or thousands of properly cross-linked sheets, all mutually feeding each other. Forget having to email around your monthly financial statements comparing actuals to budgets with everything gradually drifting out of sync as no-one is quite sure which is the master copy any more, and no ability to examine how you got to what you have. In a spreadsheet-wiki every number is a link. You can see where it came from, and where it’s being used. If an assumption changes, everything that depends on it automatically changes. No more wasting 3 weeks in a dead end because you were working from old numbers. No more wondering why that P&L entry for “Miscellaneous Expenses” was so high in March. No more wasted time collating projections and forecasts from department heads, harmonising them into a divisional budget for the upcoming year, only to have to redo the entire process 4 times when the CFO trims your budget, or the COO explains some of the impact of a new office opening in in September. Instead everyone can work on their own page, have the data pulled automatically into a series of other sheets, and have changes take effect universally and instantly whilst everyone hammers out the details — with, of course, full transparency of who changed what when (and hopefully why), and the ability to roll-back to any earlier stage.

Most of the technology to make this work already exists. There are some interesting issues when you start talking about thousands of inter-related sheets, but that can evolve when we see what the real usage patterns are. Making something come together that will show just how powerful a concept this is, is mostly just a matter of vision, SMOP, and tuits. Like any of my other ideas, I’d love to work on it, but I can’t build it on my own. If you’re interested in working with me on it, or just building it yourself and picking my brains from time to time, please get in touch.

Tony ,

Track Every Penny

June 1st, 2009

Personal finance software universally sucks. I have two theories for why this is:

Firstly, there actually is no personal finance software. It’s all just dumbed down versions of corporate accounting software. No matter how it’s dressed up to be ‘user friendly’, at the heart of it all is the core underlying assumption that you want to run your personal life like an accountant runs a business. Many people have been sucked into this way of thinking, in large part because it’s the mindset that using such software foists upon you, but really it’s a poor model.

The second reason is that it’s almost all created by Americans. By itself, of course, that’s not necessarily a bad thing. But it becomes a bad thing when it reflects their view of the world, which, particularly in matters of personal finance, doesn’t really hold true in other countries. And I’m not just talking here about the assumptions that are generally built into how relatively complex things like mortgages, or sharedealing, or taxes, or retirement accounts, etc work in different countries. Mostly I’m talking about the really simple things like not assuming everything is in US dollars! Sure, there’s generally a token nod to other currencies, but most finance software cope can’t even cope with simple multi-currency transactions (like all the times in Switzerland when I got a bill in Swiss Francs but paid in Euros), never mind the more complicated ones (like the time I bought my bus ticket from Bulgaria to Macedonia using my last remaining levs and made up the difference in Euros). And because most people who write finance software have never lived somewhere like New Zealand where the smallest coin is the ten cent piece, they tend not to cope very well with things like Swedish Rounding, where your total is rounded if you’re paying by cash (but not if by card).

Of course it’s possible to do these things in most software, but only if, under theory #1, you can think like an accountant and do the equivalent of lots of complex ledger entries. But lots of things that are deliberately really complex in business accounting are commonplace in people’s personal lives – like asking your friend if they have eighty cents to avoid needing to break another €10 note.

Most people get round all this by just ignoring it. Who really wants to have to record that their shopping bill was €14.80 but eighty cents of that came from Steve anyway? Well, me. I’m a firm believer in the “track every penny” school of thought. And I hate how hard it is for me to do so. In the 5 months of this year so far, I’ve been in 9 different countries. I keep every receipt, and record every item from every one of them. And it’s much too much hassle. I’ve tried numerous different software packages, and they’re all terrible for me. There’s a lot of innovation in the area recently with sites like Mint and Wesabe springing up and giving the old faithfuls of Quicken and Microsoft Money a serious run for their money. But there’s increasingly an assumption that most of your spending detail can be automatically obtained from your bank records so you don’t need to type it in. It makes sense for them to concentrate there, as having to painstakingly enter all your spending is the thing that puts most people off ever actually keeping track of where their money is going. But the more that part gets automated away, the less these companies work on making it really easy to enter transactions manually — which leaves me worse off, as I do the vast majority of my spending using cash, and my bank records thus tell me next to nothing. Even on the rare occasion where I pay for my groceries on my debit card, I don’t just want a total spend entered—I want a full breakdown of every line item. I want to know at the end of the year just how much I spent on milk or eggs, not just on “groceries”.

So I want software that works for me. That assumes I’ll be travelling a lot and working with multiple currencies. That makes it easy for me to enter detailed records rather than a chore. That deals with all the little details I raised last time I ranted about this.

It may be that I’m the only person in the world that actually wants this software, but I suspect I’m not. In the current economic climate people are watching their pennies carefully. Almost every personal finance book suggest that people literally track every cent they spend for at least a month. I think lots of people would like to know much more about where their money goes, but the pain of keeping track currently outweighs the benefits for a lot of people. So I want to make that easy.

I have a detailed vision of how that software would work, but I can’t build it by myself. Anyone want to help?

Tony , ,

Where can I fly to this month?

June 1st, 2009

All my playing with end-of-year travel plans has given me itchy feet. I’d like to go somewhere interesting for a few days sometime soon, but I don’t really care so much where. This is something the internets are meant to help with, but though the US is well served with any number of useful quirky travel sites, Europe doesn’t have so many of the “Just show me good deals” versions if you don’t live in certain key cities. So, in the DIY spirit, I wrote my own. I gathered a list of all the commercial airports I could find in Europe, grouped them by country, and wrote a script that searched on ITA in turn for all flights from Tallinn to any airport in that country over the next 30 days, and tell me the cheapest date to travel there. It’s a slightly nasty site to screen-scrape (and I’m pretty sure they don’t have any alternatives that you don’t have to pay for, as some of the puzzles they set job applicants involve scraping the site), and the code certainly isn’t pretty, but, thanks to Google Charts, the results are:

(Green is the cheapest, red the most expensive, yellow somewhere inbetween.)

My plan is to widen this beyond Europe, have it run every day, set some threshholds and have it email me any time something interesting appears. I suspect, however, that I’m much better served from Riga:

Thankfully there’s a comfortable bus to there!

Tony , ,

More bmi Hacking

May 26th, 2009

Star Alliance claim to be ‘committed to delivering to you the latest flight schedules from the Star Alliance members on multiple platforms Anytime, Anywhere.”‘ (emphasis mine). What’s more they go on to explain that that means that it will be ‘Automatically updated on your platform of choice.’

That is unless your ‘platform of choice’ is anything other than a Windows PC or a handheld with Palm OS, as their Electronic Timetable doesn’t run on, for example, a Mac. Instead we need to just make do with a hulking big PDF.

So, I decided to parse all the data out of that PDF, and on the basis that others might find it useful, make it available as a CSV file: Star Alliance Timetable 2009-05.

It’s nothing fancy, but being able to open it in Excel and filter on the various columns is still quite useful, and of course it opens up any number of other possibilities. I’m also considering building a little mini-application that makes it easier to play with, so if anyone has any suggestions for that, I’m all ears.

Tony , , , ,

bmi Hacking

May 25th, 2009

I’ve been a bmi Diamond Club holder for many years. Unlike most Frequent Flier programs, airmiles you earn in this scheme never expire, so I’ve built up quite a few of them. However, it’s looking increasingly likely that bmi won’t actually be around for much longer — at least not in its current form. The most likely outcome seems to be a takeover by Lufthansa, and subsequent conversion of Diamond Club to their nowhere-near-as-good Miles and More scheme. So it’s looking like a good time to turn all my airmiles into a fun end-of-year escape-the-Tallinn-winter trip.

I’ve spent quite a bit of time over the last week learning how best to go about that, and discovering all manner of interesting ways of combining the various rules in interesting ways. (Much of this is learned from the fine folks at Flyer Talk, which, once you can get beyond all the jargon, is an amazing source of tips, tricks, and useful advice.)

The first thing you need to get the hang of is the bmi zone chart. Rather than spending miles based on the actual distance you fly, the world is divided up into a series of zones, and you pay a fixed rate per flight based on the zones you’re flying to/from. (This is purely in terms of the miles spent—you still need to pay the taxes depending on the airports you use, which, of course, differ everywhere.) I found it hard to keep track of which countries were in which zone, so I drew a pretty map.

The biggest problem with constructing a suitably interesting trip is that you’re only allowed one stop-over (visiting a city en-route for more than 24 hours) per ticket. So, for example, if you were to book a return from London to Sydney you’d only be allowed to stop off in other place (e.g. Singapore) in either direction. However, you can purchase one way tickets, so by getting two of those, instead of a return, you now get a stop-over in each direction, so could stop, for example, in Singapore for a couple of weeks on the way there, and Thailand on the way back.

What I then noticed was that to go from Zone 2 (Central/Eastern Europe — where I currently am) to Zone 10 (Australia/NZ — where I want to go) is 50,000 miles each way, but two singles from Zone 2 to Zone 8 (East Asia) and then Zone 8 to Zone 10 are only 25,000 each. Thus, by going via South Korea or Japan, for example, you can effectively get 3 free stops in each direction – effectively turning a naïve two destination trip (e.g. Copenhagen – (Bangkok) – Auckland – Copenhagen) into a seven destination trip for the same price (e.g. Copenhagen – (Bangkok) – Tokyo – (Hong Kong) – Auckland – (Sydney) – Seoul – (Delhi) – Copenhagen)! These are all published Star Alliance routes: Air Asiana, for example, fly Seoul to Copenhagen via Delhi and Zurich three times a week.

If you really wanted to, you could also (again, for the same price) omit the last ticket, and return Auckland–Copenhagen via L.A. or Vancouver turning it into a complete round the world trip at half the mileage cost of an actual round-the-world ticket!

I wrote a little script to analyse the entire zone chart for other free multi-zone detours, and discovered there were quite a few of them (including some where the detour actually lowered the total price, such as Zones 2–7 via 10 which is only 70,000 miles, instead of 80,000 direct!)

Of course, the longer the route, the more complexity there is in trying to piece it all together.  You get significantly more value spending the miles on business class flights than on economy, but availability on those disappears quite far in advance on popular routes (and isn’t available at all on many Singapore Airlines flights as they reserve those for their own card-holders rather than their Star Alliance partners). But I’m currently contemplating trying to piece together a 2-10-7-9-8-2 route, which is only 110,000 base miles, and would theoretically allow something along the following lines:

Riga – (Cairo) – Bombay – (Bangkok) – Manila – (Tokyo or Sydney) – Auckland – (Shanghai) – Tashkent or Almaty – (Istanbul) – Riga.

Which, if I can pull it off, isn’t bad for only 10,000 miles more than a simple Riga–Auckland return! Suggestions / alternatives / gotchas / etc. welcomed!

Tony , , , , ,

Splitting a Wordpress blog in two

May 13th, 2009

This blog had its seventh birthday recently. I know there are many amongst you who have been blogging since before the term was even coined, and who make more posts in a month than I’ve made in seven years, but still.

Anyway, back in the early days of blogging, a significant percentage of blog posts weren’t original content, but the equivalent of retweeting: a way of passing on to your readers something interesting you’d read elsewhere. Of course the vast majority of those were links to other people’s blogs. It’s how word spread about interesting posts before digg and reddit and twitter and the like.

I tried to do something slightly different for a while: rather than just regurgitating other blog posts, I instead regurgitated interesting snippets from real dead tree books I was reading, picking interesting excerpts chapter by chapter.

It seemed to be well received, and I had a lot of fun choosing which couple of paragraphs from each chapter could convey something interesting enough to both stand alone without the surrounding context and also encourage others to seek out the book for more depth.

Early in 2004, I seem to have abandoned the idea. Likely it’s just because I was super-busy with Twingle, and then with Unite, and I probably always meant to get around to picking it up again, but just never did. Until now.

I decided, however, to do this on a new separate blog: dustyvolumes.com. So I had to work out how to move all the old posts to there. This was significantly more complicated than I expected. Doubtless someone will point me to a Wordpress plugin that could have made the whole thing take 30 seconds, but in the absence of that, here’s the gory details for anyone else who ever wants to do something like this.

First, of course, I needed to have the new blog set up. I’m assuming that’s self-evident, and needs no further explanation.

Next I needed to find all the posts I wanted to move. I already had them all tagged with “Books”, so this part was fairly easy and avoided an even longer manual process. Wordpress doesn’t have an ‘export by tag/category’ option, though—the only way to restrict an export is by author. So I had to go into “Posts > Edit”, find a post with the relevant tag, and click that tag to give me a list of all those posts. Then I could do a Bulk Edit of each to change the author to a new temporary account I set up just for this purpose. There were multiple pages of them, and there doesn’t seem to be a way to operate on more than one page at a time, so I went through them page by page. It was repetitive enough to make me want to find a short-cut, but there weren’t quite enough pages to make it worthwhile.

Then I exported all the posts by my new author, and imported those into the new blog. I did some more tidying up there of tags and categories etc, and found a few posts that should probably still remain on this blog instead (they were tagged with Books too, but were, for example, about me getting rid of my collection before moving to Estonia, rather than being excerpts suitable for Dusty Volumes), so deleted them from there, and changed the author here back to me on each of them in turn (I wanted that author to match exactly the posts that were on the other blog so I could continue to operate on those here).

Now I had the new blog working, but hit the much harder problem of what to do about the posts here. I could, of course, just have deleted the posts that I’d moved, but I still get quite a few hits on them from Google searches and links from other blogs, as well as some internal links to them, and I didn’t want to break all those. After some research I found a couple of Wordpress plugins for setting up redirection. The first one I tried, “Redirection“, has lots and lots of features, but wasn’t quite what I wanted. The second, “Redirect“, was perfect. It does only one thing, but does it simply, and does it well. Using the Custom Field options in Wordpress, it lets you set a ‘Redirect’ field with a value of the URL that viewers should be redirected to on viewing a given post. So now it was just a matter of going through and setting those up one by one.

Thankfully the Wordpress import maintains the post ID from the export, so I didn’t need to spend any time building a map of which IDs should map where: each relevant post would just need to redirect to http://dustyvolumes.com/archives/<id>. I did a couple of these manually to make sure everything was working, but there was no way I wanted to do another 150 or so by hand. It was time to go to the database.

I’ve never actually explored the Wordpress schema before, but there aren’t very many tables, and it’s fairly easy to work out what’s going on. (There’s probably decent documentation for it all too, but I tend to prefer to just work things like this out manually.) I’m not going to detail all the SQL commands I had to run: if you don’t know enough to work them out yourself you probably shouldn’t be playing with the database directly anyway, and should just do this the longwinded way (and I really don’t want to be fielding questions on it 6 months from now when the schema has changed). But it was a simple matter to just select the IDs of all posts by my fake ‘author’, and insert the relevant Redirect custom field values.

However, this still left a large number of ‘Books’ entries in my tagcloud that really weren’t there any more, so I also wanted to remove all the tags from these posts too. Ideally the Bulk Edit should be capable of this, but it currently only allows you to add a tag to multiple posts, not remove one, so again I went to the database. This one was slightly trickier, as it’s a cross-table DELETE, but again, if don’t know how to do that, you shouldn’t just be pasting in random SQL you found on someone’s blog somewhere.

Unfortunately, although that successfully removed all the tags, the tag cloud still proudly declared that I had a huge number of “Books” posts. Wordpress, presumably for speed, keeps a total of how many posts are assigned to each category in a different table, and, being a typical modern webapp, maintains that count in client code rather than in the database itself. So having manually removed lots of tags without updating the count field too, my database was now out of sync with itself. MySQL doesn’t do cross-table UPDATEs with aggregates, so this time I needed an UPDATE with a subselect of a COUNT(*).

Including lots of cautious exploratory SELECTs, lots of LIMITs of my UPDATEs and DELETEs to make sure the right thing was happening each time, and backing up carefully after each major change, the whole thing took about an hour. I could possibly have done it all via the web interface in that time, but it would have been a close call, and there was a very high chance that I’d have gotten so bored in the middle of it that I’d have abandoned it half-way through, promising to finish it another day (and likely never quite gotten around to it). This way was mentally stimulating rather than draining, thus giving much more satisfaction when done, and I learned much more about the Wordpress database structure that could be very useful if I ever decide to write a Plugin.

And now I have two blogs to rarely write in…

Tony , ,

Comparative Government

April 6th, 2009

Matt Wardman, asking what a “bicycling Parliament” would look like, compares the salary and benefits packages for Norwegian MPs to those in the UK. I’m perennially dismayed by how infrequently this sort of comparison takes place, particularly in Britain. It’s as if there’s a feeling of “We invented modern democracy and everyone should be studying us. What could we possibly learn from anyone else?”

I’ve recently been comparing the UK’s Freedom of Information laws to those of other countries, and the answer there, as always, is “quite a lot, actually” (Particular kudos on that one to Estonia where virtually all government information is automatically electronically published and doesn’t need to be specially requested.)

Generally, however, making such comparisons is trickier than it ought to be. Unless you can find a report from some organisation that published the results of an comparison of a particular area of information, it requires lots of searching through primary sources and trying to work out whether you’re comparing like to like. And that’s not including all the cases where what the laws say bears only a vague resemblance to what actually happens.

Wikipedia is a good starting place for high level information, and provides a basic (and generally well-referenced) comparison on economics, tax rates, and some legal topics (e.g. Freedom of Information and Age of Consent).  There are also good pages gathering country-specific information from a variety of sources (e.g. visa-free travel with a British passport), but on other topics (e.g. comparing the remuneration of MPs or equivalent around the world) the information is either well hidden or not gathered.

Is there some other source for this sort of comparative study, generally? If not, should there be, and if so where? Is it just a matter of seeding pages on Wikipedia for the relevant topic, and hoping people flesh it out and keep it up to date? Or is there a better alternative?

Tony , ,

Gradual deployment of schema changes

February 16th, 2009

Timothy Fritz has a very interesting blog post on Continuous Deployment at IMVU (subtitled “Doing the impossible fifty times a day”), detailing how all committed code gets automagically pushed to their cluster of servers assuming it passes all tests. One very nice aspect of it is that the change is first put live on only a small set of their machines. Then if there’s any significant variation in a series of metrics tested on those machines (load average, errors generated, etc.) the revision is automatically rolled back rather than pushed to the remainder of the cluster.

In the comments someone raises the question of how such a system can work when database schema changes are required, describing this as the “achilles heel of partial cluster deployment”.

At BlackStar we didn’t have a system anywhere near this advanced, but we did have a requirement to have as close to zero downtime as possible and so we needed to come up with a system for putting database changes live in a way that couldn’t break code in the meantime.

One of the most common schema changes in an evolving system is the gradual migration of all 1-1 relationships to 1-many or many-many. (Someone recently posited that a database archaeologist could tell the age of a system by how many 1-1 relationships still existed. I can’t remember who or where, though. Leave a comment and I’ll credit them.)

So, for example, when you start out, it’s common to have an ‘email’ column in a ‘user’ table. Eventually, though, it will become necessary to handle a user who needs to use two or more different email addresses. The obvious solution is to split out an ‘email’ table, migrate all the existing data into it, and update the code to use that table instead of the ‘user’ table. However, when you can have different machines potentially running different versions of that code (the “before” and “after” versions) simultaneously, then you have problems. If you put the database schema changes live first, then the “before” versions will suddenly break. If you put the code live first, then the “after” version won’t work until you change the schema. In an environment where down-time is acceptable, then you just turn everything off, make the schema changes, push the new code, and you’re fine. But what to do when it isn’t?

Well, then you need to do everything in stages:

  1. First, you need to create the new table. No code uses it yet, it’s simply a schema change, so you can safely make it go live.
  2. Once deployed, you change any code that writes email address to write to both places. Users are still only allowed a single email address, but now that gets inserted into both the ‘user’ table and the ‘email’ table. Under normal circumstances such duplication is bad, but it’s only a temporary measure. Everything will be properly normalised when we’re finished.
  3. Once that code is successfully live everywhere, you can then run a migration on all the existing data. Any new email addresses being added in the live system are being added to both tables, but before we can change any code to read from the new table, we need to make sure it’s comprehensive. So all pre-existing addresses need to be migrated. For a simple case like this you can probably use run a single SQL command; for more complex scenarios you may need a more involved script – but for those you may be better off breaking it down in to a series of migrations like this.
  4. Once you’re sure that both tables are perfectly in sync, and are staying that way, you can start to migrate all code that reads email addresses to use the new table. This doesn’t have to happen all at once. In a well factored system the scope of this change should be very small, but in reality you’re likely to have code strewn all over the place that reads this data. But the doubled data source means they can gradually be eliminated one by one without blocking any other changes. (At BlackStar we generally made such changes very quickly as we couldn’t put the new functionality we wanted live until we were complete, but we also had a couple of cases where it was a much longer process that took several months to change all the code to use the new table).
  5. Eventually, when you’re sure that no code reads from the old table you can remove the old code that writes to it, leaving, of course, the code that writes to the new table.
  6. Once that’s live everywhere, you can delete the column. Or, if deleting a column takes too long on your system and might cause some downtime, you can just delete all the data from it, record by record if needs be. (Or, of course, you can apply a similar multi-step approach to create a new user table without this column, migrate all the code to use it instead, and then delete the old one.)
  7. Now you have a system that, on the outside, functions identically to when you started – users can still only have a single email address. But that is no longer true of the underlying data schema. So you can now take whatever code imposes this restriction and fix it to allow for multiples without worrying about bringing the database into sync.

It’s a much more involved process, but at every step everything is consistent no matter which version of the code is active on a given server, everything continues to run safely, and there’s no need for any down time.

The actual time that it takes to get from stage 1 to stage 7 depends not only on how long it takes to develop the code changes, but also the gap between each deployment. If you only deploy changes once a week, it can take a few months to work through all the steps. If, however, you can get to a position where you can safely deploy multiple times per day, then you can of course be complete much much quicker. And if you only deploy once a month, or even once a quarter, well, then you have even bigger problems.

Tony ,