Human vs Machine: What’s Better In Search?


The next few months should be interesting to watch: Monday, Wikia Search goes online. So there we have another powerful player in the next wave of search engine wars.

For the last few years, Google with its (mostly) machine-based search algorithms has been the dominant player in the search market, producing more or less the best results by exploiting the inherent value of hyperlinks: If website authors or bloggers link to another website, so the basic idea, they endorse that website, i.e. they consider it relevant in one way or another.

Now the humans are pushing their way back into search: In 2007, Jason Calacanis’ Mahalo introduced a completely human-based search, producing great results, but only covering a relatively small number of search terms. (For terms that aren’t listed, Mahalo forwards to a Google search.) Robert Scoble already suspects that Mahalo, Techmeme and Facebook (i.e. search based on your social graph) will kick Google’s butt.

Monday, Jimmy Wales’ Wikia Search will go into public beta. Wikia Search aims at making the search algorithms open and transparent, so the black box that is Google won’t be as easily manipulated by SEO efforts.

What those projects have in commons is, as Tim O’Reilly points out, that “both are trying to re-draw the boundary between human and machine.” How this hybrid works out will determine both the quality of our search results (and thereby the way we perceive a great many things around us) and also our defense against spam.

By the way, even Google doesn’t completely rely on machines alone, but has to manually intervene with some search terms:

(…) there is a small percentage of Google pages that dramatically demonstrate human intervention by the search quality team. As it turns out, a search for “O’Reilly” produces one of those special pages. Driven by PageRank and other algorithms, my company, O’Reilly Media, used to occupy most of the top spots, with a few for Bill O’Reilly, the conservative pundit. It took human intervention to get O’Reilly Auto Parts, a Fortune-500 company, onto the first page of search results. There’s a special split-screen format for cases like this.

So why is this necessary if there is such a powerful algorithm? Writes Cory Doctorow:

The idea of a ranking algorithm is that it produces “good results” — returns the best, most relevant results based on the user’s search terms. We have a notion that the traditional search engine algorithm is “neutral” — that it lacks an editorial bias and simply works to fulfill some mathematical destiny, embodying some Platonic ideal of “relevance.” Compare this to an “inorganic” paid search result of the sort that Altavista used to sell. But ranking algorithms are editorial: they embody the biases, hopes, beliefs and hypotheses of the programmers who write and design them.

So where Google puts its money on math-fu, and Mahalo on editorial filters, Wikia Search focuses on transparency and a Wikipedia-inspired community model to open up that Google black box. What hybrid will bring us the best results and decide which information we’re going to see? 2008 will be the year that tells us. Let the battle begin!

Tim O’Reilly tells his parents: What’s Web 2.0?


Link: sevenload.com

At Web 2.0 Expo Berlin, Tim O’Reilly kindly agreed to try solving the one issue all of us are sharing: How to tell our parents what we do? So here’s Tim, explaining to his parents: What’s Web 2.0?

“So Web 2.0: First off, it’s the idea that the Web, rather than the personal computer is the most important platform for computer applications today. The applications that matter to people are no longer things like Microsoft Word or a spreadsheat. It’s things like Google, or Amazon, or if you’re a kid maybe it’s MySpace.

When we think about this, though, we have to realize whenever we have a new platform in the computer industry things work differently. So we started thinking about what makes the Web different.

And what makes the Web different is that on the network there’s the potential to build applications that actually get better the more people use them. And they actually grow organically in kind of a conversation with the users.

You can look at this with every major company that succeeded on the Web. They’re all in some way using the network to harness collective intelligence, to get better by user contribution. And I think that’s really the heart of Web 2.0.

As to why we call it Web 2, it was really after the Dotcom Bubble, and the Dotcom Bust, which was really a stock market phenomenon, everybody thought maybe the Web was over, and we thought it was still going. So Web 2 really refers not to a new technology so much as the second coming of the World Wide Web as an important technological phenomenon.”

Disclosure: Once again, this is a cross-posting from my work for Blogpiloten.de. Tim was great, as he took the time for this little experiment although he was clearly pressed for time. Thanks a lot, Tim! The video is released under a Creative Commons license (by-nc-sa 2.0).

Update, 8 Nov: I just got notice that this video made the Web 2.0 Expo website front page. Yay!