Eugene Kim raises some interesting questions about the addressibility of content from on-line spreadsheets in a blog post: “Spreadsheets 2.0 and Transclusions”
Some of this is similar to issues Danny Ayers raised about “WikiCalc and the Semantic Web” a year ago.
Both, however, fall into a trap that is a common source of spreadsheet woe: the difference between $F$23 and F23.
Using $F$23 says: any time you want to look up my value, get me whatever is in column 6, row 23.
Using F23, on the other hand, says: get me the data that is currently in column 6, row 23, but it’s the data that’s important, not the cell, so if that data moves, then adjust accordingly.
You see, the contents in a spreadsheet are fairly fluid. Rows and columns get inserted or deleted. Values get moved about – either just for aesthetic reasons, or because sorts the data. Behind the scenes the spreadsheet engine goes to a lot of work to make sure that when things change every reference is adjusted so point to the new location of the data they want.
But, of course, in the case of a remote web page somewhere pointing at http://spreadsheets.com/fred/sales_data_2006_Q4/K23, the spreadsheet engine has no ability at all to tell that page that what used to be in K23 has suddenly moved to L16.
Although it’s rare that what we *really* want is $K$23, that’s all the constructing a URL in this way can give us. Now, of course we could come up with workaround to this, from the low-tech approach of just declaring when a sheet is ‘final’ and won’t change, and so is suitable for such referencing, through to a complex event publishing model where if you’re going to refer to a sheet’s contents you also need to subscribe to its feed of changes so that you can adjust your URLs accordingly.
But neither of these things should be necessary. Spreadsheets already have the concept that is needed here: named ranges. Rather than addressing a cell, or group of cells, by their loation, and needing to adjust when that location cells, almost all modern spreadsheets allow you to give a (possibly) meaningful name to the data instead: “sales_tax_rate”, “Q3 estimates”, “xyzzy”, whatever.
So instead of making cell “C4 = B4 * M21”, you can say that “C4 =B4 * vat_rate”. And if M21 happens to move, well, we don’t care and don’t need to change anything. In a traditional spreadsheet this makes it much easier to follow what’s going on when constructing complex worksheets, but on-line it has a huge additional bonus: the added level of indirection makes it the responsibility of the spreadsheet to keep track of where the important data is, rather than the referrer.
And with a URL more akin to: http://spreadsheets.com/fred/sales_data_2006_Q4/west_coast_total, we’re protected from any rearrangement of data.
This still leaves a couple of different problems. Firstly, named ranges doesn’t exactly have a lot of support amongst the main online spreadsheet tools at this point. Presumably this will change soon, as serious spreadsheet users rely heavily on them.
More interestingly, it still leaves a gap where the spreadsheet author hasn’t given a name to the data you actually want access to. Unlike the PurpleNumbers approach, it’s difficult to even automatically give non-meaningful names to everything, as the number of possible ranges in a sheet gets very big very fast (not just A1, A2, B1, B2 etc, but also A1:A2, A1:B1, A1:B2, etc, and even dynamic ranges). It would be like having to give PurpleNumbers to every combination of letters in a document: not just each paragraph.
But even if people were to start naming the ‘important’ data in their sheets, and that data was then addressible with a normal URI, we get a huge amount of value. On that front both Eugene and Danny are completely correct. Although this naming issue distracts from their central issue, it certaily shouldn’t be seen to detract from it. We’re entering a world of addressible data: let’s just make sure we get it right.
The long term vision for purple numbers was that they would be UUIDs of some kind across the entire world of all documents, not just a single document. That does get out of hand for a person after a while, but is a foundational thing that computers and networks can be pretty good at keeping sane.
Just consider a spreadsheet a series of immutable versions of the same object, where version has a distinct identifier that’s baked into its URI space. If you make any change to the spreadsheet, a new version with a new version identifier is created. If someone asks for the value of a cell/-range in an old URI space, you send them a permament HTTP redirect to the next newest version.
No need for subscriptions or other heavy engineering. You do need to keep a version history of the structure, if not the content, of the document, but if the concept we are talking about is a cross between wikis and spreadsheets, then a version history is already part of the picture anyway.