Archive

Posts Tagged ‘MDB’

XSLT, Perl, Haskell, and a word on language design

August 14th, 2002

An interesting thread over at kuro5hin on the tribulations one person had when writing a filter to convert XML documents to LaTeX using XSLT (with examples of how some of the transformations would be handled using Perl or Haskell instead).

My initial reaction when faced with problems such as the one described in this post (lots of highly repetitive code) is always to use a different language to generate the code – e.g. use Perl to generate the repetitive XSLT. Then if you want to add a new substitution, just add it in Perl, and generate the resulting XSLT.

I still think that I’ll probably end up going down that path for the Music Database (probably using Template::Toolkit to process the .xsl files when they’re requested), but first I’m going to see how far I can stretch the abstraction in XSLT itself.

In the story above, several people pointed out better approaches to the problem, which make the ‘correct way’ much more readable and maintainable. But it’s still not great.

XML, per se, is more verbose than I’d usually like to deal with, so I’m probably going to end up with some level of abstraction into this – but I don’t want to get there too early.

Tony

Functional Programming with XSLT – A proof through examples

August 14th, 2002

This paper sets out to show that XSLT is a true functional language, by implementing 35 of the most common functions that you would encounter in such a language (foldl, map, minimum, sum, sumTree etc)! It’s long (the PDF version is 76 pages) – mostly because of the code: just because XSLT can implement all these things, that doesn’t mean it’s easy, simple, elegant or clean!

I’m enjoying playing with the new musicdatabase chain of CDDB data -> MySQL -> SQL -> Class::DBI -> FireCore -> Template::Toolkit -> XML -> XSLT -> HTML -> CSS -> display, but XSLT is jumping out as by far the most verbose, ugly, and unreadable link in that chain. It hasn’t been around enough yet, or been used widely enough yet, for there to be much written on on the speed/cost of development, or maintainability.

In theory, the really complex stuff should be sufficiently abstractable to allow for building up quite powerful libraries. Although XSLT doesn’t support higher order functions directly, it’s possible to simulate them, as this paper (and its examples) shows. If this sort of thing can be turned into a “sufficiently encapsulated hack”, then it may indeed be possible to write XSLT relatively quickly and neatly.

Tony

XSLT Abstraction

August 14th, 2002

I’m trying to learn how to abstract common elements of XSLT away. For example, the MusicDatabase’s shiny new XML output gives the run-length of a CD, or a track on it, in seconds. But, for output, we’d really like to show it as minutes and seconds (i.e. 308 would become 5:08).

This is fairly simple to do:

  <xsl:value-of select="format-number(runlength div 60, '#')"/>:
  <xsl:value-of select="format-number(runlength mod 60, '00')"/>

However, I don’t want to have to repeat that everywhere, a) because I’m lazy and don’t want to have to go to the effort of finding the file it’s in every time I want it, and cutting and pasting from there, and b) because I worked that out myself on my second day of playing seriously with XSLT, and I’ll probably discover later that there’s a much better way, or that it doesn’t cope properly with some fringe case, and I won’t want to have to go through and change it everywhere it’s used (see Laziness above). Oh, and c) because it’s always good practice to do this sort of stuff, for lots of reasons that never seemed as convincing as laziness.

Initial research into this persuaded me that there were two main approaches: parameterisation (see Creating Generic XSLT Transforms), or named templates (see The Tao of Recursion: Named Templates in XSLT, or Math and XSLT [with its 'Calculate pi using Leibnitz recursive named template']).

Named templates seemed the best way to go, so I added a secs_to_mins template:

  <xsl:template name="secs_to_mins">
    <xsl:param name="secs" />
    <xsl:value-of select="format-number($secs div 60, '#')" />:
    <xsl:value-of select="format-number($secs mod 60, '00')" />
  </xsl:template>

And then called it:

  <xsl:call-template name="secs_to_mins">
    <xsl:with-param name="secs" select="runlength"/>
  </xsl:call-template>

(It didn’t work at first, as I’d left the $ signs off, but it everything was fine once I realised that).

Then all I had to do was move it into its own file, and place at the top of any XSL file that wants it:

  <xsl:import href="lib/secs_to_mins.xsl" />

Wahey! Now I can build lots of nice components to play with.

Now I just need to find a good way to play with XSLTunit or equivalent to build up a test suite for these…

Tony

Using AxKit with Template::Toolkit

August 12th, 2002

I spent most of today trying to integrate AxKit with Template::Toolkit.

Unfortunately there isn’t really a lot of information on this available that I could find. I think most people see them as either/or approaches, rather than complementary. There is a mailing list for the two combined, but it’s fairly quiet – and seems to be more interested in using TT as a replacement for “XSLT”. I wanted to come at it from the other end – use TT to generate the “XML”, and then apply the XSLT to that XML using AxKit.

So, I decided to just blunder into it all, and see if I could make it work. It really didn’t seem like it should be that difficult.

My first attempt was to merely chain the two together:

  <Location ...>
     PerlHandler MDB::Site->handler AxKit
  </Location>

Surprisingly, everything worked first time. I had TT output a calculated page, containing <?xml-stylesheet href=”/xslt/test.xsl” type=”text/xsl”?>, and AxKit happily used this to transform the page into a lovely “HTML” rendition. Or so I thought, until I tried it in IE and it fell over. After a lot of head-scratching I realised that actually AxKit wasn’t even getting near my output, and that Mozilla was applying the transformation client-side! IE was attempting to do the same, but failing.

After some more digging I realised that FireCore was set up to return DONE if it generated output, which would stop Apache even trying to fall through to the AxKit Handler. The server merrily sent out XML and my browser merrily transformed it. And whilst this would be perfect in a few years time, it’s not much use now, when most browsers can’t be relied on to do this properly. I wanted to do it server side.

So I changed FireCore to return OK instead of DONE, falling through to AxKit when it was done. But, when you chain together two PerlHandler calls, Apache will just call each as if it was two separate requests – not in sequence. So, whilst FireCore generated the XML from TT just fine, AxKit died horribly complaining that the XML file didn’t exist, outputting a bizarrely hybrid XML/200 error page!

I couldn’t find anything obvious in the Apache or the AxKit docs, and a quick google search didn’t reveal anything either, so I hopped onto the #axkit channel on IRC, and asked there. But everyone must have been at lunch or something, as there was complete silence.

After a little more digging around I discovered that AxKit allowed you to set up your own Provider class for when your source wouldn’t just be on disk. I was slightly perturbed by the “don’t even consider doing this without asking on the mailing list about it first” warnings everywhere, but it was my best approach for now. It actually wasn’t too hard. I just made my own MDB::AxKit::Provider class my Handler, and had it pretend to be the Apache handler and ask MDB::Site for the output for that URL. (FireCore hadn’t really ever expect to be queried in this way, so I had to tease apart a few things that were too tightly coupled, but it was a fairly trivial change, and a worthwhile one anyway)

But then, the nice people of the AxKit IRC channel all arrived back from wherever they had been, and pointed me to Apache::Filter, which lets you build up a chain of Apache Handlers, each of which operate in turn on the previous one’s output. Even though I’d gotten my own sublcass working, this seemed like it would be much cleaner in the long run. Apache::Filter only works with modules which have registered themselves with this, but AxKit has thoughtfully done so, and so all I needed to do, instead of the fancy subclassing, was to tell FireCore to register itself with Apache::Filter also, and then set up my Apache config:

  PerlModule Apache::Filter
  AxConfigProvider Apache::AxKit::Provider::Filter

No actual code needing written is always a good thing, in my books.

And, hey presto! Everything worked. Well, in Mozilla anyway. IE complained: “The XML page cannot be displayed”, as there was an unclosed meta tag.

I examined my XML. I examined my XSLT. I couldn’t find such tag. I looked at the page source in my browser, and there was indeed an <meta content=”text/html; charset=UTF-8″ http-equiv=”Content-Type”> tag. I examined my XML again. I examined my XSLT again. No sign of that anywhere.

The nice people on #axkit set my straight again. The W3C say that there should be such a tag, so AxKit helpfully inserts it if you haven’t set one up. No-one could work out why IE would care though. After a lot of head-scratching and playing with different things, I discovered that it worked in one IE window, and not in another! It seems IE had decided to cache the fact this file was XML much earlier when the transformation wasn’t happening at all, and was ignoring the fact that it was now HTML. Grrr.

I still didn’t like that my page wouldn’t validate as XHTML however, so it was suggested that I explicitly set an output method in my XSL: <xsl:output method="xml" media-type="text/html" />

And now, everything was wonderful. Apart from the HTML all being one big long line that you had to scroll horizontally forever to read if you wanted to view the source. Another slight modification later, and everything was even more wonderful: <xsl:output method="xml" media-type="text/html" indent="yes" />

They even showed me where to find all this information myself :)

So, now that Marc seems to have the search working again, and I know how to get everything working in an XML-ish fashion, we can back on track with getting the MusicDatabase back up an running again!

Tony

Work Smarter, Not Harder?

July 25th, 2002

Karen comments on the Tiny Perl Server Pages article in August’s Dr Dobb’s Journal, saying that it seems familiar as “we have implemented a system that works in an similar way to this one.”

Although on the surface this article may imply this, due to its comments on session management, MVC architechtures and database tables of pages etc (the article in the magazine has much more precursive discourse than the online version), but on closer inspection the TPSP code itself doesn’t really achieve any of this! In fact it’s just a poor imitation of PHP in the JSP/ASP mold.

FireCore (our website development) system is really at 180 degrees to this sort of system. To explain I’ll go back to 1995 when I wanted to start to add some dynamic elements to the websites I’d been building. At that time there were really only two approaches: SSI and CGI.

SSI (Server Side Includes) were ’special’ HTML tags that I could use within my HTML to execute commands. Most commonly used with the ‘X-bit hack’, all you needed to do to change a plain HTML file to one that got pre-parsed by the server was ‘chmod +x pagename.html’.

Then to, for example, include today’s date, you would write HTML that looks something like this:

<p>Today's date is <!--#echo var="DATE_LOCAL" -->

Or to give the date this file was last modified:

<p>Last modified: <!--#flastmod file="index.html" -->

The other common use was to include the output of a different program (such as a hit counter):

<p>There have been <!--#include virtual="/cgi-bin/counter.pl"--> hits since 1st January 1995.

For simple changeable text within a web page this method was fine, and I used it quite happily for a few months. But it wasn’t enough for pages where the entire content would change – for example a search page. For that I was told I needed to learn CGI.

CGI is basically SSI inside out. Instead of your HTML containing some code that gets included, your write code and bind it in your webserver to be executed when a given URL is requested. Your code then gets executed and generates the HTML that will be emitted back to the browser.

In some ways this distinction is quite subtle, and basic ASP pages rarely differ that much from basic CGI pages. But as you start to do more advanced things the differences start to get greater, particularly as you start to deal with the problem of separating your presentation from your content. All sorts of people have attempted to describe how an MVC architecture can be implemented with a Server Page approach, but in general they all fall down as the View is driving everything rather than the Controller. With a CGI approach, you can get a better separation of concerns, as the CGI itself can act as the Controller.

Of course just as SSI has come a long way with ASP, JSP, PHP etc., CGI has also come a long way with many different approaches for each language. One of the most common approaches though, and the one that FireCore uses is mod_perl with Apache. Although many people run mod_perl almost exclusively with Apache::Registry, a simple wrapper around traditional CGI scripts, mod_perl is much more powerful than that. In FireCore’s case there are no CGI scripts at all. Each request gets handed to the FireCore::Request factory object, which examines the URI and loads the relevant Controller object which can examine the parameters passed, fetch the correct Model object(s) and pass them on to the View. The goal is that for most pages, all that is needed is the correct template file for the View to parse, but it is possible to intercept at any point in this, and do whatever you need.

Karen further asks, on this note, if all across the world companies that use Perl for building web based applications are all busy writing the same thing. Well, I don’t think this is at all a phenomenon unique to Perl (in fact I’d say there’s probably more code re-use in the Perl world than with most other languages), but I think it’s probably true. After all no-one else’s code could ever be as good/clear/specific to what we need as my code, could it? And some wheels are just meant to be triangular.

With FireCore we tried to avoid reinventing too many wheels! Really FireCore is just the initial mod_perl handler and a collection of useful generic Controller classes for database-backed web-pages (Viewer, Editor, Eraser, Uploader, Searcher etc.). The Model is handled by Class::DBI, and the View by Template::Toolkit – both well supported, actively maintained, widely used suites of modules available from CPAN.

This lets us build our sites really quickly, freeing up enough time to go reinvent some other wheels instead.

Tony

Building the Music Database

July 24th, 2002

Steve complains that this series hasn’t progressed in a few weeks. The next entry in the series was going to be a look at searching, using Class::DBI::mysql::FullTextSearch. But, when we used it we discovered that DBIx::FullTextSearch (which this uses) just doesn’t scale well. There’s well over 7 million tracks in the Music Database now, and searching them in that way just isn’t fun.

So, we spent a few weeks trying other solutions, and eventually settled on Lucene (which DBIx::FullTextSearch tries to replicate). It’s super lightning fast and solves all our problems. Of course, it’s in Java, so cue much playing with Inline::Java. We eventually got it working, but it’s still not tidied up to the extent that it’s a one-liner in FireCore, so I haven’t been able to write it up yet :(

I’ve also had a few people ask me questions about Class::DBI from this, so whilst we’re waiting, I’m going to put together a basic introduction to moving to Class::DBI.

Normal service should resume shortly. :)

Tony

HTML Abstraction (Building the MDB part 3)

July 2nd, 2002

Last week we built the first few pages for The Music Database, to show how everything hangs together. Before we delve a little deeper into some more complex pages, we’ll look at cleaning up the HTML.

We left our CD details page looking like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  [% META browser_title = "CD details" %]
  <h1>[% cd.title %]</h1>
  <h2>
    <a href="/show/artist/[% cd.artist.id %]">
     [% cd.artist.name %]
    </a>
  </h2>
  <p>Length: [% cd.length %]</p>
  <p>Tracks:
     <ol>
       [% FOREACH track = cd.tracks %]
       <li>track.title</li>
       [% END %]
     </ol>
  </p>
</code>

FireCore provides a set of macros specially designed for outputting HTML. Using these, the top half of our page above will now look like:

1
2
3
4
5
6
7
  [%
     META browser_title = "CD details";
     h1("cdTitle", cd.title);
     h2("artistName",
        a("artistLink", "/show/artist/$cd.artist.id", cd.artist));
     p("cdLength", cd.length);
  %]

So, why bother? What’s wrong with just using HTML.

Well, nothing really. If you prefer, you can just write your page in that manner. As we’ve already seen, it works just fine. But personally, I prefer a little abstraction.

As you’ll have noticed, the HTML we produced from this new version isn’t quite the same as the previous one. We didn’t say h1(cd.title) – we said h1(“cdTitle”, cd.title). If you’re actually trying out the examples as we go, you’ll have noticed that doesn’t generate the plain <h1> we had before, but adds a CSS class to it: <h1 class=”cdTitle”>. It’s always a good idea to add CSS to things so that you can style how they look in your stylesheet. But when I’m writing HTML I often forget. I’m too busy trying to ensure that I print all the right things to make sure that I print everything right.

So all our HTML macros will insist we add a CSS style by making it the first argument we pass. If we forget to pass it, and instead wrote something like [% h1("Title") %] this would become <h1 class="Title"></h1> and we’d notice that it was missing from the page and quickly fix the problem.

We can also nest tags easily and neatly:

3
4
     h2("artistName",
        link("artistLink", cd.artist.name, "/show/artist/$cd.artist.id"));

This approach also makes sure we write correct HTML. All the macros ensure that tags are correctly closed, in the correct order.

We also get the slight benefit that the HTML we generate is slightly future-proof. Our macros currently output XHTML 1.0. If XHTML 1.4 comes along in a few years time and changes how certain tags work, we should only need to change our macros file for this change to take effect across our entire site. Similarly if we wish to emit HMTL 4.2.

None of these reasons are probably enough to justify learning what is basically a new language (and a quite tricky one unless you’ve used Lisp or somesuch). But it gives us the grounding for more significant changes.

We’re probably going to be linking to artist pages quite often throughout the site. So, let’s create a macro just for that. FireCore always creates a macros/local file for us in the template tree, so we can just add it in there:

1
2
3
  MACRO link_artist(artist) BLOCK;
    link("artistLink", artist.name, "/show/artist/$artist.id"));
  END;

Now, anywhere we want to link to the artist, we can just call this template, passing it the artist object. So, we can change our CD details template accordingly:

3
     h2("artistName", link_artist(cd.artist));

Now that the link is suitably abstracted away we never need to remember what CSS class name we’re meant to use when we want a link. This small abstraction alone will probably save me hours over the lifetime of this project as I can never remember things like this and would have to always look it up when I wanted to create a new link. And if we want to change the CSS class name, or how an artist URL is constructed, or add a little icon to the link, it would now be trivial – a change we probably wouldn’t have considered before, unless it was really necessary, once the link was embedded in 30 different templates.

Now we’re starting to see the real benefits. Because setting up our Model and Controller are usually so simple, we’ll spend most of our time creating the View in Template Toolkit. So it’ll be important to find the correct abstractions there – not just of CSS styles and URLs, but of design elements. Good design is consistent across a web site, and by definition will appear on multiple pages. Whilst judicious use of CSS can help with this, many design elements are conceptually larger than a style-sheet rule. TT allows us to abstract these away also. Because I’m not a designer, I won’t be showing you much of this. But over the next few days I’ll show how keeping the templates as clean as possible will make life easier for a designer to make the site look much better than I ever could.

Tony

Building the MDB 2: The Artist Page

June 27th, 2002

Yesterday we built our first page. Because we had to set up some database mappings it may have seemed more complicated that it actually was. If we wanted to add another page today that viewed a CD in a different way, all we would have to do is add another line (7) to the config_info in Site.pm:

5
6
7
8
  sub config_info {qq{
      /view/cd gets MDB::CD
      /show/cd gets MDB::CD
  }}

And now we just create another template in page/show/cd.tt.

Today we’ll build another similar page, for artist details.

In the database schema we had yesterday, we assumed that “artist” was a field in the CD table. In reality this would usually be a foreign key to an artist table, with, say, ‘artistid’ and ‘name’ columns. So let’s migrate our system to that.

First, we create the MDB::Artist class, telling it what table the data lives in, and setting up the has many relationship back to the CDs by this artist:

1
2
3
4
5
6
7
8
9
  package MDB::Artist;
 
  use base 'MDB::DBI';
 
  __PACKAGE__->set_up_table('artist');
  __PACKAGE__->has_many(cds => 'MDB::CD');
 
  1;
</code>

Then we let the CD class know that its ‘artist’ column doesn’t contain the artist name, but rather a pointer to the artist table (line 7):

1
2
3
4
5
6
7
8
9
  package MDB::CD;
 
  use base 'MDB::DBI';
 
  __PACKAGE__->set_up_table('cd');
  __PACKAGE__->has_many(tracks => 'MDB::Track');
  __PACKAGE__->has_a(artist => 'MDB::Artist')
 
  1;

Now, any time we ask a cd for its artist we’ll get back an MDB::Artist object instead of a plain string,

So, in our template, where we previously had

<h2>[% cd.artist %]</h2>

we now need

<h2>[% cd.artist.name %]</h2>

And now we can build our artist page. Again it’s very simple. We just add it to our config_info (line 8):

5
6
7
8
9
  sub config_info {qq{
      /view/cd gets MDB::CD
      /show/cd gets MDB::CD
      /show/artist gets MDB::Artist
  }}

And now we can create our artist template, page/show/artist.tt:

  [% META browser_title = "Artist details" %]
  <h1>[% artist.name %]</h1>
  <p>CDs:
     <ul>
       [% FOREACH cd = artist.cds.sort('year') %]
       <li>
         <a href="/show/cd/[% cd.id %]">[% cd.title %]</a>
       </li>
       [% END %]
     </ul>
  </p>

And of course we should really go back to our CD template and link the artist name to their own page:

    <h2>
      <a href="/show/artist/[% cd.artist.id %]">[% cd.artist.name %]</a>
    </h2>

And now we have a simple browsing mechanism from a CD to its artist to all that artist’s CDs.

Tony

Building the MDB 1: The First Page

June 26th, 2002

We already have a database built of the freedb data (it’s a little out of date, but we can resync that later). So the basic version of the first page is fairly simple. We’ll add a page to display the details of a given CD, and use it as an example of how FireCore works.

To construct our Model we use Class::DBI. We’ll see more detail of what’s possible with it later, but for now all we need to do is use it to set up a basic class to represent the CD table. As we’re using MySQL, we’ve set up our MDB::DBI class as a subclass of Class::DBI::mysql, and told it how to connect to the database. Then we can use the fact that Class::DBI::mysql knows how to query the database for the schema of a table, and the only code we need to write to represent the CD is:

1
2
3
4
5
6
7
  package MDB::CD;
 
  use base 'MDB::DBI';
 
  __PACKAGE__->set_up_table('cd');
 
  1;

Then we add a line to the Site configuration to create the page to view the cd:

1
2
3
4
5
6
7
8
9
  package MDB::Site;
 
  use base 'FireCore::Site::Decl';
 
  sub config_info {qq{
      /view/cd gets MDB::CD
  }}
 
  1;

Each page that we wish to create for the site will get a line in the config_info.

The sytax for this is quite simple. Here we’re saying that the page “/view/cd” will take an additional argument (e.g. /view/cd/12039) which will represent an MDB::CD object. FireCore will take that argument, validate it, retrieve the MDB::CD with that ID, and pass it to the template.

Then all we need to do is set up the template, which will be “page/view/cd”. Tomorrow we’ll look at setting up some basic abstractions for the pages, but for now we’ll just create a simple HTML page.

FireCore automatically creates header and footer templates that will surround each page, so we don’t need to worry about those, other than to set the page time, so we create a template like:

  [% META browser_title = "CD details" %]
 
  <h1>[% cd.title %]</h1>
  <h2>[% cd.artist %]</h2>
  <p>Length: [% cd.length %]</p>

FireCore uses Template::Toolkit as its templating engine, so we enclose our template directives inside [% %]. As we get passed an MDB::CD object this will be available as ‘cd’ inside the template. Because it’s a Class::DBI object, it will automatically have all the columns of the table available as methods on the object. So to print the title of the CD, we simply say: [% cd.title %]. Now we can visit our CD page and make sure everything looks OK.

The most obvious thing missing now is track information. That lives in a different table, so we need to set up a Track class:

1
2
3
4
5
6
7
  package MDB::Track;
 
  use base 'MDB::DBI';
 
  __PACKAGE__->set_up_table('track');
 
  1;

Simple. And then, we need to tell the CD class that it has a relationship with the Track class (Current versions of MySQL don’t store this sort of information in the schema, so we can’t auto-detect this):

1
2
3
4
5
6
7
8
  package MDB::CD;
 
  use base 'MDB::DBI';
 
  __PACKAGE__->set_up_table('cd');
  __PACKAGE__->has_many(tracks => 'MDB::Track');
 
  1;

The only addition is line 6, where we tell the CD class to create a method ‘tracks’ which will return a list of all tracks whose ‘cd’ column is the same as our primary key: i.e. it will execute the SQL:

  SELECT *
    FROM track
   WHERE cd = $cdid

Template Toolkit gives us a looping construct that lets us then insert all the tracks into our template (ordered, of course, by their position on the CD):

  <p>Tracks:
     <ol>
       [% FOREACH track = cd.tracks.sort('position') %]
       <li>[% track.title %]</li>
       [% END %]
     </ol>
  </p>

And now we have our simple CD details page. Beyond setting up some basic Class::DBI representations in a purely declarative manner, note that we didn’t write a single line of Perl code. In fact, once you have your tables all mapped in this manner, which in practice is usually part of the set-up phase of the system, rather than done piece by piece like this, you can create quite a lot of your site without ever writing any Perl.

Tony

Building the MDB 0: Installing FireCore

June 26th, 2002

FireCore is much harder to actually set up than it should be. Once it’s all configured for a site it’s wonderful (as we’ll see later), but actually getting to that point is much too complex. There’s a whole range of things that need to be remembered: configuring the database, adding the requisite controls to the web server config files, setting up the directory structure for the templating system, creating the basic Perl subclasses for this site, setting up the revision control systems etc, etc.

This ends up taking at least half a day, and there’s always things that get forgotten or being done not quite right. Unfortunately we can’t even use a deferred install model where each bit only gets set up as it’s needed, as all of these are needed from the outset. We really need to find a way to automate this as much as possible. Most of it’s just cut’n'paste’n'tweak anyway, replacing the domain names and paths with the correct values. Something that asks a few pertinent questions and then just goes and makes everything happen could reduce this sequence to a matter of minutes rather than hours.

Tony