John Markoff wrote in the NY Times earlier this week that Web 3.0 is coming. In some quarters he may as well have said that that sky was falling. The Web 2.0 moniker is still contentious in its own right, and enough scorn has already being poured on companies seeking investment for their Web 3.0 product, without someone trying to lay claim to the name now for the next phase of the Semantic Web.
Ross Mayfield is among the people pushing back on this, citing his earlier “headline only” posts: “There is no Web 3.0” and “Web 2.0 is made of people”.
I find myself the rather interesting position of agreeing for the most part with both John and Ross.
For the most part, the whole version numbering of the Web is stupid. Websites in 2003 were wildly different from those in 1999, but no-one then was declaring us to be in the era of Web 1.7 or Web 1.2. But “Web 2.0” let people make a break from the past. A lot of the confusion over what Web 2.0 actually means really stems from everyone having something different they wanted to escape. Two of the main splits were from web-designers and entrepreneurs. Web designers wanted to get away from an era of not being able to do anything interesting because they had to spend so much of their time supporting old crumbly browsers. Google Maps gave them a big example to cite when arguing with their boss that the technology has been available long enough and if anyone can’t use it, that’s their problem. Entrepreneurs wanted to bury the “internet bubble” hangover and start building new and exciting things again. For a few years VCs had been complaining that they had no way to get rid of their money, and that they might have to start returning funds, uninvested. Meanwhile Y Combinator was massively increasing the visibility of angel funding on the back of Paul Graham’s relentless evangelism of the start-up religion. Google and Yahoo were on just enough of a buying spree to make first time entrepreneurs believe that all they had to do was work like mad for three months and then flip. And with the bookshelves full of titles like “How To Build Amazon.com in Just 72 hours with Ruby on Rails” everything was primed for something to happen.
But I’m certainly not ready to say that anything should be called “Web 3.0”, and certainly not the Semantic Web. Mostly because I think it’s a silly thing to do, and I don’t want the Semantic Web being so easy to dismiss. I think both John and Ross miss the point on this front. Or, perhaps more accurately, they both see it, but don’t really understand it.
Because at its heart, the Semantic Web is more about people than machines. Stories about how, with a little work on setting up the right ontologies, you’ll be able to suddenly merge databases together with the magic fairy dust of RDF are rightly laughable, whether you’re painting grandiose visions of the entire web, or even integrating a single company’s client list with that of its new parent company. The biggest difficulty in any of these projects isn’t technical, and can’t be solved by computers. In the absence of any other identifying information, a computer can never reason out whether, as Clay Shirky asked, your John Smith is the same person as my John Q. Smith.
But whereas this type of problem created the brick wall for old school AI, we live in a new era, where as Aaron Swartz has pointed out, our choices aren’t just million dollar markup or million dollar code. We can now also harness million dollar users. Wikipedia is fast becoming the quickest way to find information on almost anything, including things that would otherwise not be seen as particularly encyclopedic. (I was on holiday in Estonia when the final Formula 1 race of the year was being held in Brazil and wanted to find out what local TV channel it would be shown on, and at what time. After half an hour of fruitless searching through a variety of TV listing sites, I discovered that the Wikipedia F1 page included a list of which channel broadcast the race in each country, and from there was quickly able to find what I needed.)
But Wikipedia is essentially unstructured. It has an increasingly elaborate templating system underneath that lets users create “infoboxes” of pseudo-structured data, but trying to write software to extract information from wikipedia pages is a step back to what extracting information from Amazon was like before they introduced web services – fragile screen-scraping destined to collapse at the tweak of a page layout.
If it were possible to extract structured information from Wikipedia, the possibilities would be endless. But, of course, extracting structure where there is none is back to million dollar code. There are, however, moves afoot to extend the Wikipedia syntax to allow for the semantic annotation of facts. This would allow the million dollar users to gradually layer structure over the information. This structure would then be queryable, both inside the wiki and outside. There would be no more need to maintain by hand all those “List” pages, like the “List of rivers of Africa“. Each river could just be annotated on its own page as being “in Africa”, and this page would be a simple query. The page for each river already includes information on what country it is in, what length it is etc. But this is purely human readable. With just a slight extension to the wiki syntax, this information can be annotated in-place and become machine readable. (In the most popular extension it’s somply a matter of extending “… is a river in [[Zimbabwe]]” to “… is a river in [[location::Zimbabwe]]”) Most users don’t even need to learn the syntax. Wikipedia has already shown that if someone creates the initial information a lot of other people will turn up to tidy and rearrange it to fit the current social norms.
This approach works bottom up, rather than top down. There’s no need to wait for a Standards Committee to construct an elaborate ontology. You can just create new relationship types on the fly, in true wiki fashion, and then work out later how to best describe the actual meaning of the relationship. Standards wonks can come along behind the normal users and annotate the meta-information at one remove to enable more complex queries or reasoning. Purists hate the idea, of course, and there are thousands of reasons why it could never work. But there are thousands of reasons why wikis in general don’t work, and why wikipedia can’t even conceivably work, and that hasn’t been much of a problem so far. Like most things in computing these days, the biggest issues are really social, not technical, and Wikipedia seems to have found a way to work despite, on indeed, perhaps because of, all their social problems. So at this stage I think I’d be willing to take my chances on rewriting the future, and place my bet on it actually being Wikipedia Takes All.