Jeremy Zawody ably dissects the insanity that is the latest MySQL article on O’Reilly’s OnLamp. This article is written by one of the authors of O’Reilly’s latest book on MySQL, and has probably served to ensure that I don’t buy this book.
I’d add the following comments to Jeremy’s:
4. Don’t store binary data in MySQL
I rarely store BLOBs in a table with other data. But I do store them in MySQL. My reasoning is much more prosaic though. I frequently use MySQL from its command line interface, and much as it’s bad practice in real life code, I like being able to do a “SELECT *” in a query there. And I hate not being able to see all the results I’m interested in “at a glance”. Having BLOBs in the table makes the command line unusable in this manner unless you explicitly name the columns you want to see when operating in this mode – and I hate having to do that. So, I just throw all my BLOBs into a different table and get the best of both worlds.
5. Stick to ANSI SQL.
Most programmers now accept that premature optimisation is the root of all evil. But is there an equivalent warning against “premature portability”? Reese claims that making use of MySQL’s extensions “may make it impossible to port the application to another database without a significant rewrite.”
I’ve met lots of people who migrated applications away from MySQL, but none of them seemed to have any good reasons for it, and most did it as part of a complete migration away from a LAMP-style set-up to a more traditional enterprise set-up with Oracle and Java etc, where they had to rewrite everything anyway. At BlackStar we were persistently told that we’d never be able to scale a LAMP set-up, or that we’d never get funding without ‘solidifying’ our set-up. Well, we scaled it quite nicely, and the VCs (after a few initial queries) seemed quite happy by the huge difference in costs between our set-up and that of many of our peers.
If it’s a prerequisite that your software has to be database agnostic, then, yes, ignore all non-standard features in every database. But you’ll probably have to ignore quite a few ‘standard’ features too, as not every database handles them the same way.
If it’s not, then why bother? If your application is well developed you’ll abstract most of your database code away anyway. Migrating it if you ever need to may not be trivial, but it should be a long way from “impossible”.
6. Create your own sequence generation scheme.
This one has me completely boggled.
* You can have only one AUTO_INCREMENT column per table.
Yes. So? Why would I need more? This is equivalent of saying “You can have only one PRIMARY KEY per table.” I really don’t know what’s behind this, but can only assume that Reese doesn’t actually understand what AUTO_INCREMENT is used for.
* You cannot have a unique sequence for multiple tables.
Again, so what? If you really need something like this, then have a ‘mykey’ table that has a single column which is an AUTO_INCREMENTing value. Then when you need a “super-unique” value for another table just INSERT a null row into the ‘mykey’, get the value that appeared, and use it as your key in the other table.
If you find yourself doing this though, you should probably revisit your schema or approach to the problem, as you’re probably doing something wrong somewhere.
* You cannot easily determine from an application what values MySQL has automatically generated.
Umm. Why would you want to? If I have a DEFAULT value for any other column I can’t determine what values were automatically generated there either. What’s the problem?
Personally I use AUTO_INCREMENT almost everywhere. Even when I have an ‘obvious’ primary key, I rarely use it, preferring to mark it as UNIQUE instead. I’ve spent far too long in the past updating records all over the entire database because a value for a primary key changed, and was referenced in 40 other tables. I much prefer all primary keys, and thus foreign keys, to be meaningless. Then application data can change with impunity. AUTO_INCREMENT makes this style of development trivial. It takes a while to get used to, but this style of development is one of the things that makes MySQL so powerful.
7. Do not mix display code and database code.
Well, yes, but what’s that got to do with MySQL?
8. Normalize with zeal, denormalize sparingly.
Well, yes, but what’s that got to do with MySQL? Maybe this point would be interesting if Reese had pointed out that the need for denormalisation is even rarer with MySQL as it’s blinding fast even if you need a 12 table join. But he didn’t.
9. Use connection pooling in Web servers and application servers.
If you’re using a traditional database, then yes, make sure you do this. But with MySQL connecting to the database is not an expensive operation. Like many other things that take an age in other databases, MySQL is lightning fast in this regard. Yes, abstract your connection information away – you don’t want to have to update multiple files if you want to change it – but don’t worry too much about the speed here. Compared to whatever you’re going to asking the server for, the connection overhead will be tiny. Don’t worry about this one until you really need to.
For my money, the most important MySQL tip omitted is “write small queries”. One of the biggest problems with the traditional MySQL tables is their locking. If you have a table that has frequent writes, you’ll probably run into the scenario where Person A writes a SELECT that joins a lot of tables, does a complex GROUP BY and ORDER BY, and patiently waits for the result. In the meantime Person B comes along an issues an UPDATE to one of those tables. It patiently waits until Person A’s query is finished, but meanwhile Person C comes along and issues a simple SELECT on that table. Now MySQL has a problem. It’s can’t finish the UPDATE until the long SELECT has completed. And it can’t do C’s SELECT in case B’s UPDATE effects that. Because this is MySQL it may only take 10 seconds, instead of the 10 minutes it might have taken on another database, and this speed actually lulls developers into thinking that that’s OK. But in a high-traffic web site it’s probably not. If enough of these start to back up, you get all sorts of jams. Normally fast queries will start taking too long and everything will start getting sluggish. MySQL will resolve it all soon enough, but it’s not a nice experience.
The solution is simple. Avoid queries that take more than a tenth of a second to run. It’s MySQL – that should be fairly easy. Don’t be afraid to split your query into a few smaller queries. It’ll still be fast. If you need to run complex reports, then do it on ‘snapshot’ mirrors of the data that aren’t getting updated.
MySQL’s newer table types have apparently solved this problem for the most part – but the 1/10th of a second rule is a good one anyway. MySQL provides an option to log all slow queries to a file. Turn it on, and turn its setting down to 1 second (it’s the lowest it will go in the current version, but will apparently be reduced to allow sub-second setting in the next version). Then monitor what queries are getting logged – and attack them.
Every database has strengths and weaknesses. Playing to MySQL’s strengths and working around its weaknesses is fairly easy. And if you do it, you’re unlikely to ever need to port your application to a different RDBMS.