Archive

Posts Tagged ‘perl’

Free like a Gift

August 7th, 2006 Tony No comments

There has been some discussion recently on whether the “free” in Free Software is “free like a puppy“, or “free like a flower”. These posts, the comments on them, and the wider discussion in which they participate (on the death of NDoc), examine both why people use free software, and why people release free software.

This ties in neatly with a conversation I was having recently with Karen about her CPAN tutorial at the upcoming YAPC::Europe conference.

Karen was hoping to avoid discussing the question of “Why would I release my code to CPAN?”, and instead focus on the “how”, once someone had already made that decision. I encouraged her to add something about the flipside: “Why might I not want to release it?”

There has been surprisingly little written about uploading code to CPAN in general, so it’s not surprising that there has been even less written about not doing so. CPAN itself is largely silent about the philosophical issues, so I had a look at Sam Tregar’s “Writing Perl Modules for CPAN” book.

Right at the start of Chapter 1 Sam considers the “Why contribute” question. He states: “CPAN is more than just a repository—it’s a community.” One of the primary reasons why you might upload your code to CPAN is therefore because “you’ll come into contact with other highly talented Perl programmers.” And “just as contributing to CPAN enhances a programmer’s resumé”, a company can likewise “improve its hiring ability” by supporting CPAN, as well as “establishing a standard around [their] practices”. There’s also a nod towards idealists contributing as “a good way to save the world”, but this isn’t really explained.

There are some interesting assumptions lurking throughout the book. The first is the notion, seemingly taken as axiomatic, that by contributing to CPAN you’re choosing to become part of a community. There also seems to be the assumption that what you’re uploading is a “project” that you are planning to continue to develop, with implications about your code being good enough to release, along with comments about “competitors who might be close to a release themselves”, and that your job is to “create the next CPAN hit”, and to do this you’ll need to “keep your user community engaged”.

Lurking under all this seems to be the idea that you want your software to be widely used and popular. For authors who actually have this goal, Sam offers lots of good advice, but the book completely ignores the entire class of authors who actually aren’t seeking that. Unfortunately, many of the citizens of the Perl Community share this same blind-spot, and have a tendency to place a series of obligations upon all CPAN authors.

For those authors who see releasing their code to CPAN as part of their journey towards Perl mastery, there is much to be learned from paying careful attention to the wide range of ancillary websites and mailing lists where releases are commented upon, annotated, tested on various platforms, checked against a 19 point “kwalitee” list, and bug reports are filed etc.

For the growing numbers of authors for whom their CPAN releases are their business, not just in the “making their CV look good” way, but because they offer consulting services based around them (or are hoping to), then it becomes even more essential that the feedback from these sites is monitored and responded to quickly.

But for authors who do not share these goals, many of the benefits disappear, leaving what seems to be little more than a growing number of expectations and obligations.

One of the benefits of Free Software is in not having to start from scratch. If you need to write some code, and you can find something close to what you need, with a suitable license, you can take that and adapt it. (If you’re lucky enough to find something that does exactly what you need, then that’s more useful, but you’re actually no better off than if you found suitable free but non-Free Software, unless you want to distribute it). Unfortunately, the Perl Community doesn’t appear to place much value on this freedom. I suspect that at least some of this stems from the difficulties of running modified versions of Perl modules when there is no core way to specify that you want to use an exact version of a module (there are several workarounds available, but they don’t seem to be widely used). So when a CPAN author either rejects or ignores a patch people get more upset than might be expected when Freedom 1 is in place. If a CPAN module doesn’t install or doesn’t compile, most people seem to just write it off and go in search of something better, and never discover that it might actually be perfect for their needs if they just spent 10 minutes fixing a couple of simple problems.

To give a concrete example: back in 2001 I wanted to analyse my telephone bill. BT helpfully provided all the data in CSV format, but rather than just manipulate it in Excel, I decided to do it with Perl. I then released to CPAN the module I wrote, in the hope that if someone else ever wanted to work programmatically with their phone bill, this might help them. It wasn’t a project I was planning to continue to develop, and I certainly wasn’t wanting to build a community of users—I was just offering my work as a gift to anyone else who wanted to save a little time later.

At some stage in the following couple of years, BT changed the format of their CSV file to add some new fields. I didn’t notice, as I was no longer even using the module myself. There was however at least one other user, who noticed that the code was now broken. I wasn’t that interested in fixing the code, or even applying patches to it, not least because I wasn’t using it. As the user in question was Simon Cozens, however, and I trusted him to not do anything horrendous [insert your own joke here], I gave him “co-maintainer” status and let him fix the module and upload a new version of it himself. I suspect he’s no longer using the module himself (or certainly won’t be shortly when he moves to Japan), so if BT change their format again, the module will break again, and again I almost certainly won’t notice. And if someone reports this as a bug, I almost certainly won’t fix it. And if they’re not someone I know well enough to slap with a wet kipper if they mess things up, I probably won’t make them a co-maintainer either. And although I could probably come up with a better design so that when BT change their CSV format the module doesn’t break, I’m not really that interested in doing that either.

To many in the Perl Community these statements are pure heresy. Not only do they show that I’m too selfish, and not a good member of the community, but they spoil the type of advocacy that tells businesses how Perl is great because with CPAN you get all this wonderful code that Just Works everywhere and is well maintained by friendly, receptive, responsive authors who’ll do whatever you need without having to pay them. In other words, the “Free as in beer” approach.

Maybe CPAN would be superficially better with higher quality thresholds, and if authors were explicitly committing themselves to a life of maintenance and servitude (or at least until they could find some other sucker to take over). And maybe authors who aren’t prepared to live up to the expectations of the community should just keep the code to themselves, or just publish it on their blog, or distribute it in some other manner than helps keep CPAN pure. But I personally think CPAN is great because of all the “crap”, not despite it.

Authors who upload to CPAN are offering you a gift. You certainly don’t have to accept it, but you equally don’t have to criticise it or attack the author for not meeting your expectations of what the perfect gift should be.

Tags: ,

Bug Fixing Day

December 27th, 2004 Tony No comments

I have over 40 modules on CPAN, and probably average about 1 bug report a week (not including all the discussion that goes on the Class::DBI mailing list). Most of them are really simple to fix without much effort at all (such as typos in the docs), but like most of these things, if I don’t fix it straightaway, then it’ll be forever before I actually get around to it.

Today I went on a big cleanup and fixed a whole bundle of bugs, some of them reported over a year ago!

I’ve also taken a load of the bug reports that have been emailed to me, which had been filed into about 5 different local mail folders, and forwarded them all off to CPAN’s RT interface instead. I’ve gradually been changing the ‘how to reports bugs’ sections of all the modules to direct people to there instead of private email, as it should make it easier for me to keep track of everything. Of course, having forwarded lots of mail to there I now have more open bugs on the system than I did this morning, despite all my fixes, but at least it’s a more realistic snapshot…

Tags:

Mutiny at the Maypole

November 15th, 2004 Tony No comments

For the last few months I’ve been a slightly more than casual observer of the Maypole project. Although I don’t actually run any sites on Maypole, it’s based on rather a lot of my public code, and a lot of issues people raise on the mailing list aren’t really about Maypole, so much as Class::DBI or CGI::Untaint or Class::DBI::FromCGI or somesuch.

It’s also been interesting to watch as, although it’s not directly connected, Maypole seems to have been heavily influenced by Kasei’s internal FireCore framework which Simon had to work with for the guts of a year. He threw away lots of our mistakes, implemented some of the things we’d talked about but hadn’t yet gotten around to, and, being Simon, took some of the ideas much further than we’d imagined. So I was keen to see how it would hold up in the wild.

Of course one of the first things people wanted was to replace the dependency on Template::Toolkit, and be able to use any of the myriad of Perl templating systems. In hindsight I believe implementing this to have been the first major post-release flaw.

I’ve been a student of web abstraction frameworks for a few years now, and the siren song of many of them is the idea that depending on any given component is a problem. Every part should be capable of being swapped out for an equivalent. In database backed web-based systems in particular this means you shouldn’t be tied to any given database or database abstraction layer, or to any given templating system. I don’t know whether it’s a false hubris, or just a good old fashioned desire for world domination, but there’s a pervasive idea that the framework should be capable of doing anything and everything you desire.

But I’ve come to believe that exactly the opposite is true. Frameworks have most power when they aim to do one thing well, and become tightly entwined around the components that make them up. In Maypole’s case, its initial goal was to make it really trivial to build database backed web systems around Class::DBI and Template::Toolkit. Lots of people build such systems using precisely those components, and they all repeatedly face and solve the same problems, time and again. Maypole neatly bundles up and encapsulates one path of building such sites, freeing developers up from the drudgery to concentrate on the value-added business logic.

But converting such a framework to hot-swappable components restricts the amount of best practice than can be neatly encapsulated. Each RDBMS/OO mapping layer does some things better than others. Each templating system, likewise. A framework committed to one of each can play to their strengths. A framework allowing interchangeability has to reduce itself to lowest common denominator.

One the advantages Maypole had over FireCore was a set of default templates. Simon now regrets building these, as they were so good that people believed they should be able to use them for everything, when for most applications most people will need to write their own. But, tied to Class::DBI and Template::Toolkit, Maypole would be able to create much better default templates, making better use of CDBI’s introspection methods, and TT’s VIEWs, MACROs, and Plugins. In a more generalised world this becomes much harder, and the framework becomes blander and blander.

Nothing can be everything to everyone. Too many frameworks try, and end up being used by almost no-one other than the person who wrote it – mainly because they’re the only person who can ever find their way through the maze of twisty abstractions.

Behind my beliefs on this lies one often overlooked truth: frameworks are hard. They’re much more about philosophy than technology. Early design mistakes get magnified dramatically, usually in direct proportion to the level of abstraction you’re aiming at. And the more you build the framework in isolation from the sites you’re building using the framework, the more likely you are to make those mistakes. And once they’re made they’re really hard to recover from.

I once consulted for a company who had what we came to call “three legged cow syndrome”. They had developed a basic content management system that, like many of these things, had morphed into a product that could, of course, do anything any client (and particularly any potential client) wanted. And every time someone came with a request for something the system didn’t already do, they tweaked the system so that it could do that in future. After all, that’s just good practice.

Of course, each time they did this, they codified a hybrid of generic and specific requirements – mainly because they couldn’t know the difference.

Because the first client who ever wanted a certain feature had certain business rules associated with how that feature would work, the code assumed that every client who wanted that feature must work the same way. But of course, every client’s process was different. And, almost invariably, insane. It’s the nature of business systems. They evolve over time to account for all manner of foibles. As technologists we want to simplify these things, and refactor them all away.

But almost every real-world business has all manner of bizarrely crippled logic underlying their business rules. They’re all three legged cows. And in this company every client who later wanted the system to do something slightly differently (as they all did) was treated as if they were insane. Because everyone knows you don’t do that like that – you do it like this. And when they finally prevailed, the developers would have to spend weeks untangling the code to work out what was common and what was unique to each client.

The only way around this syndrome that I know of is to implement the first client’s additions in isolation, as some sort of standalone plugin or add-on. Then the second time someone comes in with something similar do the same, based on what you’ve learned the first time. The third time you can begin to factor out the commonality. Cautiously.

There’s been a lot of furore on this sort of issue recently in the emerging Maypole community. The new maintainer after Simon’s retirement attempted to take the framework too far too quickly. In some ways this speed has probably saved Maypole. By moving at such a rapid pace, no-one could keep up, and people started complaining. If he’d moved much slower, introducing his features one at a time, each one would probably have made seductive sense, and been accepted, until eventually there was nothing left. Frog boiling is a remarkably effective means of changing the direction of any project if you do it slowly and subtly enough.

But now there’s been a fork. Hopefully Sebastian can take the new Catalyst project off to where he wants to go, and Simon F can revert Maypole back to a really good CDBI+TT framework. And hopefully I can avoid being seduced too much, and steal the truly good ideas back for FireCore.

Tags: ,

The Joys of CSV

November 12th, 2004 Tony No comments

I’ve been working with CSV files a lot recently, mostly as a way of building web based management information tools out of SAGE data.

But I’ve always really hated working with the interface to Text::CSV_XS. So I put together Text::CSV::Simple. You just point it at the file you want, and read out all the rows:

my $parser = Text::CSV::Simple->new;
my @data = $parser->read_file($datafile);

You can tell it you only want certain fields:

$parser->want_fields(1, 2, 4, 8 );

And that you want the results straight into a hashref rather than just a listref:

$parser->field_map(qw/id name null town/);

There are also trigger points where you can pre- and post-process the data.

It’s certainly made dealing with CSV much easier for me. And it seems to be useful for other people too, as within a few weeks of its release I’ve had several feature requests and bug reports. Usually it takes a couple of months for a new module of mine to build up enough steam to get that.

However, I’ve now had several people all report a problem that I didn’t even consider before: it doesn’t handle newlines in strings. This disturbed me as I hadn’t realised until this that CSV files could actually contain embedded newlines! Of course, I can’t find any sensible documentation anywhere of what the CSV file format actually does and doesn’t allow, as it seems that Microsoft just made it a defacto standard by making it the main export format from Excel, without ever really specifying how it can be used. The few sites that I found that claim to provide more details on the format are contradictory (e.g. over the issue of header rows).

But it certainly does seem that linebreaks are acceptable, as long as they’re properly quoted. This shoots my whole approach to parsing the files apart, and means I’m going to have to go back and pretty much rewrite the module from scratch, and I may even have to lose one of my trigger points, as I still want to use Text::CSV_XS to do the actual parsing for me, but I’ll need to hook in at a different level now.

Of course I face my normal Open Source dilemma with this. The code clearly has a bug, but it’s not one that has any effect on me. None of the CSV files I have to deal with have linebreaks inside records. If the code wasn’t released, I’d apply my XP YAGNI principles, and defer the fix until I needed it. In some ways I’d like to be able to tell people who reported the bug that I’ll happily accept a patch if they can fix it, but otherwise they’ll have to wait until I need it. But having public code out there with known bugs irks me, so I guess I’ll just have to find the time from somewhere to fix it myself!

::Simple

May 1st, 2004 Tony No comments

As of this morning there are just under 250 modules on CPAN matching ‘::Simple’. I take a certain amount of blame for this. I’ve released a couple myself, and my kitchen has a credit as the birthplace of Test::Simple.

But I like to think that in those cases the module really does deserve the ‘simple’ moniker. Perl’s spreadsheet modules are notoriously complex, and there’s no need to jump through all the hoops of two-dimensional cell access and data vs. formatting if all you want to do is read or write each row as an array. Similarly, Test::Simple has one trivial test function that is all a beginning test-writer needs to get into the way of writing tests, and there’s a clear migration path up to Test::More.

But somewhere along the line ::Simple seems to have mutated into ::I::Don’t::Like::The::Normal::Syntax. Take DBD::mysql::SimpleMySQL. Ignoring the fact that it’s in completely the wrong namespace as it’s not a driver, I’m at a complete loss to see how it’s “simpler” than, well, anything really.

Now, I’m obviously biased, but even the example in the docs makes my brain hurt:

my $select = ['Passwd.*', 'UsrGrp.UsrGrpName'];
my $from = ['Passwd'];
my $joins = [];
push @{$joins}, join_struct("PasswdHostGrp", "Passwd.PasswdID", "PasswdHostGrp.PasswdID");
push @{$joins}, join_struct("UsrGrp", "Passwd.PrimaryGroupID", "UsrGrp.UsrGrpID");
my $wheres = "PasswdHostGrp.HostGrpID IN ('group1', 'group2')";
my $arrayref = dbselect_array($dbh, build_select($select, $from, $joins, $wheres, 0))

I can understand trying to make it slightly easier to build SQL dynamically, or trying of provide a way to abstract and package up SQL patterns, but I really don’t understand programmers’ fascination with trying to completely reinvent SQL. The basics are really quite straightforward, especially at the level that these modules are able to ‘hide’ from you. And it’s going to be much simpler and more powerful when you eventually need to do something considerably more complex. There’s really no substitute for learning how to use SQL if you’re going to work with databases.

In fact I’d say that trying to reimplement SQL is a move in exactly the wrong direction. Especially in a dynamic and flexible language like Perl, it’s good practice to invent little mini-languages for the domain in which you’re working. But, of course when working with databases there’s no need to invent the language – it already exists!

Somehow I doubt this will stop the proliferation of modules trying to do away with SQL. Soon we’ll probably have as many database abstraction libraries as we do templating systems…

Tags: ,

I just love it when I can delete code!

November 5th, 2003 Tony No comments

Jim Weirich enthuses about deleting code.

I also love deleting code. In the past month I’ve managed to remove almost 33% of the code from a major work project. It’s been slowly growing in size for over a year, and I’ve managed to get it back to the level it was at in February; although it obviously has much more functionality now.

Much of this was by removing non-obvious duplications (although there were some obvious ones too), removing workarounds for bugs in library code that has since been fixed, and just plain ripping out functionality that’s never actually used. (As much as we like to believe in YAGNI all sorts of things always creep in that seem like they’ll be useful but never actually are, and just turn out to be a complication when you need to extend or modify the code.)

But a large part of it was also finding code that we’d written that we didn’t ever need to, as someone else had already written it. CPAN continues to amaze me more and more every day. Unless you’re working on something really esoteric, 90% of the code you need is probably already available there for you. It’s just a matter of spotting what isn’t really business logic (you usually have much, much more scaffolding than you think), knowing how to find the modules that already provide the functionality, and then gluing it all together.

We were also able to find 3 or 4 things in the code that really weren’t that connected to the actual project, where we would have used a CPAN module if there had been one, rip them out and contribute them back to CPAN. I guess this isn’t really deleting code, but I like to think that it is :)

Tags:

CGI::FormBuilder

August 7th, 2003 Tony No comments

I’ve been hacking on CGI::FormBuilder this week.

It’s very nifty, looks like it could save me a lot of work, and has one of the best websites for a module that I’ve ever seen.

But. It doesn’t quite do some things that I want. So I thought I’d subclass it, override some of the nasty bits, and everything would be rosy. Nice theory – if it wasn’t for the fact that the core method is about 900 lines long.

The author seems happy to accept patches, but I can see this becoming quite a timesink before I can get what I need from it …

Tags:

In the Future

March 31st, 2003 Tony No comments

In the future, there will be so much open source software available, programmers will be judged by how much they know about it and how well they can glue it together to build solutions.

dive into mark

In the Perl world this is already the case. The Perl programmer who knows what’s on CPAN has a huge head start on the one who doesn’t…

Tags:

Pod::Coverage and Overloading

March 3rd, 2003 Tony No comments

I’m a big believer in automated testing. Not just that your code performs the correct functions, but also that the code meets whatever ‘policy’ decisions you’ve chosen. So, for example, at work we can’t check in perl code in that hasn’t provided POD documentation all its public methods. This is achieved using the wonderful Pod::Coverage module.

But today, when trying to check some code in, I was warned that I hadn’t documented the methods ("" and (bool in a class!

A little digging revealed an interesting facet of Perl that I hadn’t previously been aware of. I was using Perl’s overloading ability, which lets you specify how an object should be treated in various contexts (in this case, when you try to stringify it or check it in a boolean context). This is a really useful ability, but I’ve never examined how it actually works before. It turns out that it does this, in part, by creating functions in your class that are named after the operation you’re overloading, prefixed with a bracket. (Beyond that it gets much scarier!)

So, when Pod::Coverage comes to do its checking (by digging through the symbol table for the class it’s examining to see what methods exist and should be documented) it discovers these strange methods and, finding no documentation, complains.

Until I see how Richard Clamp decides to work around this in Pod::Coverage I’m now faced with either providing a bizarre piece of documentation or adapting our
build script to not complain about this sort of error…

Tags:

Tips for Using Perltidy

January 10th, 2003 Tony No comments

Things I’ve learned about perltidy.

When returning a hashref from a subroutine, make sure to explicitly give the ‘return’ command, or else perltidy will get confused, think it’s a bare block, and format it strangely.

Similarly, if the last statement of a script is complex, with nested subrefs etc, adding the trailing semicolon can change how perltidy indents the code quite significantly…

Tags: