Autonomic Computing

Bob Morris’ keynote started with a call to not only keep in mind history, but also to look at where we’re going. In the last hundred years $1000 worth of computational power has increased by 14 orders of magnitude – and is still accelerating. But, the cost of IT management has reversed – the TCO for storage is now two-thirds for management (your sysadmin) vs. one third for the storage itself (hardware and software). As well as the major costs of technology now being labour, that labour is also a major cause of availability problems. A recent survey of the cause of downtime reported the problem as being hardware 20% of the time, software 40%, and operator error the other 40%. And as eBay, AOL, etrade, Schwab etc have discovered in the last few years, that downtime can be very expensive. Technology needs to get better at managing itself.

The history of computing is a series of massive simplifications to the user experience that drives the technology (timesharing -> PCs -> GUI -> Web -> ???). But complex heterogenous infrastructures are hard: there are thousands of tuning parameters on hundred of components (a web site has to have properly configured firewalls, DNS services, caches, web servers etc.)

So, Morris wants to see autonomic systems that are:

self-configuring (can adapt to their environment)
self-optimising (can monitor, and tune)
self-healing (can discover, diagnose, and react)
self-protecting (can anticipate, detect, and protect)

Is this a pipe dream? Much of it is already happening – it’s just not holistic yet.

As an example take RDBMS query optimizers. By considering their environment, and considering the data they need to search, good optimizers can get 2 or 3 orders of magnitude performance increase (cf. compilers which are deemed to be good if they can get 20%-30% improvement). And now we’re starting to see learning optimizers, which can keep statistics on how they’re performing (disk space is cheap enough and plentiful enough now to actually keep large log files). Then they can make adjustments if they discover that they’ve gotten their cardinalities wrong.

For Homogenous Components Interacting, he gave the example of adaptive network routing, or high available clustering. He explained how Oceano‘s multiple-customer server farms uses virtualized hardware and virtualized software, to provide clustering for multiple web sites across shared servers. Although this is not a new technique, it’s still not common, as many clients still don’t really trust the security of sharing servers. However, costs are now reaching a point where the difference in price is significant enough to convince many people to swallow their fears!

He also describe recent advances in storage technology with collective intelligent storage bricks. These have much higher redundancy than RAID, and cool performance hotspots by taking proactive copies. With sufficiently improved sparing you can eliminate the need for repare actions for the life of the system. This has the added benefit of no longer needing a 2d packing structure, as you no longer need to be able walk around the machine to pull out drives. If you don’t need to replace the ‘bricks’ you can have a 3d structure, and with better cooling systems you can now get a petabyte of storage in a small cube.

He then proceeded to talk about the costs of managing client machines, used by non techies, which are usually around 50% of total time/cost, and promoted the idea of subscription computing. Many organisations have started to use this for customisation, or personalisation, or protection, or problem detection, or software updates, etc. But again it’s not that widespread, and not very holistic. IBM have recently started to reintroduce the old mainframe “hypervisor” concept to allow your machine to run multiple operating systems on the one machine, which Morris thinks will make this much easier.

Morris maintains that we need to focus on availability, maintainability, scalability, cost and performance. Systems need to be usuable by millions of people, but managed by half a person. This is a hard problem, which won’t be solved overnight, and needs the participation of academia, government and industry. More information on the project is available at http://www.ibm.com/research/autonomic

Understanding Nothing

Tony Bowden's ramblings

Leave a Reply