John Markoff wrote in the NY Times earlier this week that Web 3.0 is coming. In some quarters he may as well have said that that sky was falling. The Web 2.0 moniker is still contentious in its own right, and enough scorn has already being poured on companies seeking investment for their Web 3.0 product, without someone trying to lay claim to the name now for the next phase of the Semantic Web.
Ross Mayfield is among the people pushing back on this, citing his earlier “headline only” posts: “There is no Web 3.0” and “Web 2.0 is made of people”.
I find myself the rather interesting position of agreeing for the most part with both John and Ross.
For the most part, the whole version numbering of the Web is stupid. Websites in 2003 were wildly different from those in 1999, but no-one then was declaring us to be in the era of Web 1.7 or Web 1.2. But “Web 2.0” let people make a break from the past. A lot of the confusion over what Web 2.0 actually means really stems from everyone having something different they wanted to escape. Two of the main splits were from web-designers and entrepreneurs. Web designers wanted to get away from an era of not being able to do anything interesting because they had to spend so much of their time supporting old crumbly browsers. Google Maps gave them a big example to cite when arguing with their boss that the technology has been available long enough and if anyone can’t use it, that’s their problem. Entrepreneurs wanted to bury the “internet bubble” hangover and start building new and exciting things again. For a few years VCs had been complaining that they had no way to get rid of their money, and that they might have to start returning funds, uninvested. Meanwhile Y Combinator was massively increasing the visibility of angel funding on the back of Paul Graham’s relentless evangelism of the start-up religion. Google and Yahoo were on just enough of a buying spree to make first time entrepreneurs believe that all they had to do was work like mad for three months and then flip. And with the bookshelves full of titles like “How To Build Amazon.com in Just 72 hours with Ruby on Rails” everything was primed for something to happen.
Naming is important. Most of these things had been ticking away for quite some time. Just calling the technical side “Ajax” sparked much more interest and development than having to explain what XHttpRequest could do for your JavaScript. Calling the business side “Web 2.0” had much the same result, even though, in both cases, the terms are still mutating daily. The terms have great value, even if they’re mostly meaningless.
But I’m certainly not ready to say that anything should be called “Web 3.0”, and certainly not the Semantic Web. Mostly because I think it’s a silly thing to do, and I don’t want the Semantic Web being so easy to dismiss. I think both John and Ross miss the point on this front. Or, perhaps more accurately, they both see it, but don’t really understand it.
Because at its heart, the Semantic Web is more about people than machines. Stories about how, with a little work on setting up the right ontologies, you’ll be able to suddenly merge databases together with the magic fairy dust of RDF are rightly laughable, whether you’re painting grandiose visions of the entire web, or even integrating a single company’s client list with that of its new parent company. The biggest difficulty in any of these projects isn’t technical, and can’t be solved by computers. In the absence of any other identifying information, a computer can never reason out whether, as Clay Shirky asked, your John Smith is the same person as my John Q. Smith.
But whereas this type of problem created the brick wall for old school AI, we live in a new era, where as Aaron Swartz has pointed out, our choices aren’t just million dollar markup or million dollar code. We can now also harness million dollar users. Wikipedia is fast becoming the quickest way to find information on almost anything, including things that would otherwise not be seen as particularly encyclopedic. (I was on holiday in Estonia when the final Formula 1 race of the year was being held in Brazil and wanted to find out what local TV channel it would be shown on, and at what time. After half an hour of fruitless searching through a variety of TV listing sites, I discovered that the Wikipedia F1 page included a list of which channel broadcast the race in each country, and from there was quickly able to find what I needed.)
But Wikipedia is essentially unstructured. It has an increasingly elaborate templating system underneath that lets users create “infoboxes” of pseudo-structured data, but trying to write software to extract information from wikipedia pages is a step back to what extracting information from Amazon was like before they introduced web services – fragile screen-scraping destined to collapse at the tweak of a page layout.
If it were possible to extract structured information from Wikipedia, the possibilities would be endless. But, of course, extracting structure where there is none is back to million dollar code. There are, however, moves afoot to extend the Wikipedia syntax to allow for the semantic annotation of facts. This would allow the million dollar users to gradually layer structure over the information. This structure would then be queryable, both inside the wiki and outside. There would be no more need to maintain by hand all those “List” pages, like the “List of rivers of Africa“. Each river could just be annotated on its own page as being “in Africa”, and this page would be a simple query. The page for each river already includes information on what country it is in, what length it is etc. But this is purely human readable. With just a slight extension to the wiki syntax, this information can be annotated in-place and become machine readable. (In the most popular extension it’s somply a matter of extending “… is a river in [[Zimbabwe]]” to “… is a river in [[location::Zimbabwe]]”) Most users don’t even need to learn the syntax. Wikipedia has already shown that if someone creates the initial information a lot of other people will turn up to tidy and rearrange it to fit the current social norms.
This approach works bottom up, rather than top down. There’s no need to wait for a Standards Committee to construct an elaborate ontology. You can just create new relationship types on the fly, in true wiki fashion, and then work out later how to best describe the actual meaning of the relationship. Standards wonks can come along behind the normal users and annotate the meta-information at one remove to enable more complex queries or reasoning. Purists hate the idea, of course, and there are thousands of reasons why it could never work. But there are thousands of reasons why wikis in general don’t work, and why wikipedia can’t even conceivably work, and that hasn’t been much of a problem so far. Like most things in computing these days, the biggest issues are really social, not technical, and Wikipedia seems to have found a way to work despite, on indeed, perhaps because of, all their social problems. So at this stage I think I’d be willing to take my chances on rewriting the future, and place my bet on it actually being Wikipedia Takes All.
Danny, I think you’ve missed my point. I’m not dismissing RDF et al – far from it. I’m a believer in the Semantic Web – that’s why I don’t want it to be attached to a stupid “Web 3.0” meme. I’m well aware of the Semantic Mediawiki – I’ve already posted here several times about doing things like building financial systems on top of it, and although I now prefer the Socialtext wiki, there’s no way we could migrate our company’s wiki to it due to the lack of support for these features. I’m well aware that you’re doing lots of interesting and exciting work in this area – but we’re still a very very long way away from the ‘system that can give a reasonable and complete response to a simple question like: “I’m looking for a warm place to vacation and I have a budget of $3,000. Oh, and I have an 11-year-old child.”’ cited in NYT article, or the examples in TBL’s SciAm article. I believe we’ll get there eventually, but my point is that, although we still need to solve some important technical problems the main barriers to are going to be social, not technical.
I agree with many of your points, and believe you’re absolutely right to emphasize the people angle. But I don’t see why you dismiss RDF and related technologies, they can enable a lot of exactly the kind of thing you’re talking about. There are a whole range of practical applications around data integration that don’t need fairy dust.
As you mention, people are starting to use Wikis for semantic annotations etc. Mostly using RDF, notably Semantic MediaWiki which builds on the same codebase as Wikipedia. The Wikipedia data itself is available in RDF.
Regarding your example – integrating a single company’s client list with that of its new parent company. Sure, without additional information you can’t merge sets of names. But chances are the would also be the individual’s email addresses, and a merge would be entirely feasible. RDF offers a framework through which you could do it. In that particular case you wouldn’t have to do any work on setting up ontologies, because FOAF already covers this, perhaps with a few terms added from the vCard vocabulary, and one or two new terms might be needed to cover the company’s specific requirements.
Note that last point. RDF and Semantic Web technologies are designed so that you don’t need a top-down ontology. It just offers a common, fairly low-level data language. With RDF you can just create new relationship types on the fly. RDF is agnostic on how (in human terms) the relationships are created.
In my day job we’ve just finished a not-dissimilar system (to the point of getting it working, still lots of cleanup & optimisation to do). It’s for medical records, a doctor does a query about a patient, they get their results. From the user’s point of view its very like a traditional database app. But what’s happening behind the scenes is that data is obtained from various remote, completely independent databases and mapped to a common RDF model. The specific query is then done as SPARQL against that. It’s not laughable, it’s functional.
This app is fairly domain-specific and because of the nature of the data it’s not going to be on the web. But thanks to various design points (especially using URIs to identify things) the Resource Description Framework is eminently suited for doing data integration on the web.
Tony,
Another excellent post. A great summary of where we’ve been and where the web is going.
One of my favorites from “The Devil’s Dictionary 2.0.”