Archive

Archive for July, 2002

MySQL best practices

July 14th, 2002 No comments

Jeremy Zawody ably dissects the insanity that is the latest MySQL article on O’Reilly’s OnLamp. This article is written by one of the authors of O’Reilly’s latest book on MySQL, and has probably served to ensure that I don’t buy this book.

I’d add the following comments to Jeremy’s:

4. Don’t store binary data in MySQL

I rarely store BLOBs in a table with other data. But I do store them in MySQL. My reasoning is much more prosaic though. I frequently use MySQL from its command line interface, and much as it’s bad practice in real life code, I like being able to do a “SELECT *” in a query there. And I hate not being able to see all the results I’m interested in “at a glance”. Having BLOBs in the table makes the command line unusable in this manner unless you explicitly name the columns you want to see when operating in this mode – and I hate having to do that. So, I just throw all my BLOBs into a different table and get the best of both worlds.

5. Stick to ANSI SQL.

Most programmers now accept that premature optimisation is the root of all evil. But is there an equivalent warning against “premature portability”? Reese claims that making use of MySQL’s extensions “may make it impossible to port the application to another database without a significant rewrite.”

I’ve met lots of people who migrated applications away from MySQL, but none of them seemed to have any good reasons for it, and most did it as part of a complete migration away from a LAMP-style set-up to a more traditional enterprise set-up with Oracle and Java etc, where they had to rewrite everything anyway. At BlackStar we were persistently told that we’d never be able to scale a LAMP set-up, or that we’d never get funding without ‘solidifying’ our set-up. Well, we scaled it quite nicely, and the VCs (after a few initial queries) seemed quite happy by the huge difference in costs between our set-up and that of many of our peers.

If it’s a prerequisite that your software has to be database agnostic, then, yes, ignore all non-standard features in every database. But you’ll probably have to ignore quite a few ‘standard’ features too, as not every database handles them the same way.

If it’s not, then why bother? If your application is well developed you’ll abstract most of your database code away anyway. Migrating it if you ever need to may not be trivial, but it should be a long way from “impossible”.

6. Create your own sequence generation scheme.

This one has me completely boggled.

* You can have only one AUTO_INCREMENT column per table.

Yes. So? Why would I need more? This is equivalent of saying “You can have only one PRIMARY KEY per table.” I really don’t know what’s behind this, but can only assume that Reese doesn’t actually understand what AUTO_INCREMENT is used for.

* You cannot have a unique sequence for multiple tables.

Again, so what? If you really need something like this, then have a ‘mykey’ table that has a single column which is an AUTO_INCREMENTing value. Then when you need a “super-unique” value for another table just INSERT a null row into the ‘mykey’, get the value that appeared, and use it as your key in the other table.

If you find yourself doing this though, you should probably revisit your schema or approach to the problem, as you’re probably doing something wrong somewhere.

* You cannot easily determine from an application what values MySQL has automatically generated.

Umm. Why would you want to? If I have a DEFAULT value for any other column I can’t determine what values were automatically generated there either. What’s the problem?

Personally I use AUTO_INCREMENT almost everywhere. Even when I have an ‘obvious’ primary key, I rarely use it, preferring to mark it as UNIQUE instead. I’ve spent far too long in the past updating records all over the entire database because a value for a primary key changed, and was referenced in 40 other tables. I much prefer all primary keys, and thus foreign keys, to be meaningless. Then application data can change with impunity. AUTO_INCREMENT makes this style of development trivial. It takes a while to get used to, but this style of development is one of the things that makes MySQL so powerful.

7. Do not mix display code and database code.

Well, yes, but what’s that got to do with MySQL?

8. Normalize with zeal, denormalize sparingly.

Well, yes, but what’s that got to do with MySQL? Maybe this point would be interesting if Reese had pointed out that the need for denormalisation is even rarer with MySQL as it’s blinding fast even if you need a 12 table join. But he didn’t.

9. Use connection pooling in Web servers and application servers.

If you’re using a traditional database, then yes, make sure you do this. But with MySQL connecting to the database is not an expensive operation. Like many other things that take an age in other databases, MySQL is lightning fast in this regard. Yes, abstract your connection information away – you don’t want to have to update multiple files if you want to change it – but don’t worry too much about the speed here. Compared to whatever you’re going to asking the server for, the connection overhead will be tiny. Don’t worry about this one until you really need to.

For my money, the most important MySQL tip omitted is “write small queries”. One of the biggest problems with the traditional MySQL tables is their locking. If you have a table that has frequent writes, you’ll probably run into the scenario where Person A writes a SELECT that joins a lot of tables, does a complex GROUP BY and ORDER BY, and patiently waits for the result. In the meantime Person B comes along an issues an UPDATE to one of those tables. It patiently waits until Person A’s query is finished, but meanwhile Person C comes along and issues a simple SELECT on that table. Now MySQL has a problem. It’s can’t finish the UPDATE until the long SELECT has completed. And it can’t do C’s SELECT in case B’s UPDATE effects that. Because this is MySQL it may only take 10 seconds, instead of the 10 minutes it might have taken on another database, and this speed actually lulls developers into thinking that that’s OK. But in a high-traffic web site it’s probably not. If enough of these start to back up, you get all sorts of jams. Normally fast queries will start taking too long and everything will start getting sluggish. MySQL will resolve it all soon enough, but it’s not a nice experience.

The solution is simple. Avoid queries that take more than a tenth of a second to run. It’s MySQL – that should be fairly easy. Don’t be afraid to split your query into a few smaller queries. It’ll still be fast. If you need to run complex reports, then do it on ‘snapshot’ mirrors of the data that aren’t getting updated.

MySQL’s newer table types have apparently solved this problem for the most part – but the 1/10th of a second rule is a good one anyway. MySQL provides an option to log all slow queries to a file. Turn it on, and turn its setting down to 1 second (it’s the lowest it will go in the current version, but will apparently be reduced to allow sub-second setting in the next version). Then monitor what queries are getting logged – and attack them.

Every database has strengths and weaknesses. Playing to MySQL’s strengths and working around its weaknesses is fairly easy. And if you do it, you’re unlikely to ever need to port your application to a different RDBMS.

Tags:

How to Write Unmaintable SQL

July 5th, 2002 No comments

In complex SELECTs, alias table names to the names of other unrelated existing tables.

Tags:

Changing the Calendar in Radio

July 3rd, 2002 No comments

Mark’s tip of the day is to add a CAPTION to your blog’s calendar. However, he claims that: You can only do this in weblogging tools that … allow you to customize the HTML generated for calendars (which rules out Radio).

However, that’s not true. It’s fairly simple to change how Radio renders its calendar – once you get to grips with how Radio actually works. In your system tray you can find the little green circle on white square logo for Radio. If you right click on that you get a menu that includes “Open Radio”. This brings you into the actual Radio application – the real windows application that hides behind the web interface that most people usually use.

From the “Window” menu you should be able to find a “Radio.root” link. If you open that, and navigate through the table to system.verbs.builtins.radio.weblog.drawCalendar you’ll find the code of the macro that renders your calendar. (The outliner takes a bit of getting used to, but it’s not too tricky). Then if you Ctrl-F (Find) “hCalendarMonthYearRow” (the CSS class of the pseudo-caption), you can change the HTML it produces.

In my case, I changed it to: add ("<caption class="hCalendarMonthYearRow">" + monthYearString + "</caption>")

And, hey presto, you have a CAPTION in your table.

HTML Abstraction (Building the MDB part 3)

July 2nd, 2002 No comments

Last week we built the first few pages for The Music Database, to show how everything hangs together. Before we delve a little deeper into some more complex pages, we’ll look at cleaning up the HTML.

We left our CD details page looking like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  [% META browser_title = "CD details" %]
  <h1>[% cd.title %]</h1>
  <h2>
    <a href="/show/artist/[% cd.artist.id %]">
     [% cd.artist.name %]
    </a>
  </h2>
  <p>Length: [% cd.length %]</p>
  <p>Tracks:
     <ol>
       [% FOREACH track = cd.tracks %]
       <li>track.title</li>
       [% END %]
     </ol>
  </p>
</code>

FireCore provides a set of macros specially designed for outputting HTML. Using these, the top half of our page above will now look like:

1
2
3
4
5
6
7
  [%
     META browser_title = "CD details";
     h1("cdTitle", cd.title);
     h2("artistName",
        a("artistLink", "/show/artist/$cd.artist.id", cd.artist));
     p("cdLength", cd.length);
  %]

So, why bother? What’s wrong with just using HTML.

Well, nothing really. If you prefer, you can just write your page in that manner. As we’ve already seen, it works just fine. But personally, I prefer a little abstraction.

As you’ll have noticed, the HTML we produced from this new version isn’t quite the same as the previous one. We didn’t say h1(cd.title) – we said h1(“cdTitle”, cd.title). If you’re actually trying out the examples as we go, you’ll have noticed that doesn’t generate the plain <h1> we had before, but adds a CSS class to it: <h1 class=”cdTitle”>. It’s always a good idea to add CSS to things so that you can style how they look in your stylesheet. But when I’m writing HTML I often forget. I’m too busy trying to ensure that I print all the right things to make sure that I print everything right.

So all our HTML macros will insist we add a CSS style by making it the first argument we pass. If we forget to pass it, and instead wrote something like [% h1("Title") %] this would become <h1 class="Title"></h1> and we’d notice that it was missing from the page and quickly fix the problem.

We can also nest tags easily and neatly:

3
4
     h2("artistName",
        link("artistLink", cd.artist.name, "/show/artist/$cd.artist.id"));

This approach also makes sure we write correct HTML. All the macros ensure that tags are correctly closed, in the correct order.

We also get the slight benefit that the HTML we generate is slightly future-proof. Our macros currently output XHTML 1.0. If XHTML 1.4 comes along in a few years time and changes how certain tags work, we should only need to change our macros file for this change to take effect across our entire site. Similarly if we wish to emit HMTL 4.2.

None of these reasons are probably enough to justify learning what is basically a new language (and a quite tricky one unless you’ve used Lisp or somesuch). But it gives us the grounding for more significant changes.

We’re probably going to be linking to artist pages quite often throughout the site. So, let’s create a macro just for that. FireCore always creates a macros/local file for us in the template tree, so we can just add it in there:

1
2
3
  MACRO link_artist(artist) BLOCK;
    link("artistLink", artist.name, "/show/artist/$artist.id"));
  END;

Now, anywhere we want to link to the artist, we can just call this template, passing it the artist object. So, we can change our CD details template accordingly:

3
     h2("artistName", link_artist(cd.artist));

Now that the link is suitably abstracted away we never need to remember what CSS class name we’re meant to use when we want a link. This small abstraction alone will probably save me hours over the lifetime of this project as I can never remember things like this and would have to always look it up when I wanted to create a new link. And if we want to change the CSS class name, or how an artist URL is constructed, or add a little icon to the link, it would now be trivial – a change we probably wouldn’t have considered before, unless it was really necessary, once the link was embedded in 30 different templates.

Now we’re starting to see the real benefits. Because setting up our Model and Controller are usually so simple, we’ll spend most of our time creating the View in Template Toolkit. So it’ll be important to find the correct abstractions there – not just of CSS styles and URLs, but of design elements. Good design is consistent across a web site, and by definition will appear on multiple pages. Whilst judicious use of CSS can help with this, many design elements are conceptually larger than a style-sheet rule. TT allows us to abstract these away also. Because I’m not a designer, I won’t be showing you much of this. But over the next few days I’ll show how keeping the templates as clean as possible will make life easier for a designer to make the site look much better than I ever could.

Tags: