Wikispeedia

For those interested, I will here provide some information about Wikispeedia's research background. I did this work as part of my Master's degree in computer science at McGill University in Montreal, Canada.

For a more in-depth description of the project, please refer to the following papers:

Data related to the project can be found here and here.

On July 21, 2010, the Philadelphia Inquirer published an article that talks about Wikispeedia.

What problem is Wikispeedia trying to address?

The general purpose of Wikispeedia is to automatically learn commonsense knowledge.

For instance, every child knows that a wheel is the part of a car which touches the ground and that wheels usually come with tires mounted on them. Computers don't know these things. But I think they should, since computers that possess this kind of knowledge are simply much more useful. Just think of an automatic question-answer system: You ask your computer, "I've got a flat tire. Where can I get it fixed?" It will be hard for your computer to help you if it doesn't know that the wheel you are talking about is a part of your car and that it should consequently look up the addresses of nearby car garages.

Why a game?

Reliable commonsense-knowledge data is hard to come by: You have to get humans to enter them into a computer system, by either relying on their will to contribute voluntarily (e.g. the MIT's Open Mind Common Sense project) or by paying them money (e.g. Princeton's WordNet project).

Recently, a third alternative is being explored: Pay people fun rather than money, by making them play games which are enjoyable and at the same time produce valuable data that can be used in machine learning applications. The Google Image Labeler is an early example, and Wikispeedia follows the same paradigm.

How is Wikispeedia making use of the game traces recorded?

When you play a game of Wikispeedia you describe a trace through Wikipedia as you're hopping from article to article. Wikispeedia tries to learn such facts as "CARs have WHEELs" by recording your traces and by then analyzing these data.

Currently, Wikispeedia learns knowledge that has a slightly simpler structure. The first goal is not to learn in what exact way two things are related (e.g. "A WHEEL is part of a CAR") but rather to what degree two things are related (e.g. WHEEL and CAR are highly related words, whereas WHEEL and BAR are not).

The fundamental idea is pretty straightforward: When you try to reach the article that talks about a certain thing (e.g. WHEEL) you are likely to do so by going through at least one article (often many more) that talks about a related thing (e.g. CAR). As Wikispeedia is seeing several games that have the same target article, its guess about what words are related to the target word becomes more and more reliable.

Here's a real example: JUICE was the target of seven games that have been recorded. In six of these games CITRUS was the article from which the player hopped to the target JUICE, while in only one of the games did the player choose to go through LEMON. The articles through which CITRUS and LEMON were reached are VITAMIN C, FRUIT (three times each) and FLORIDA (once). Here is the ranking of words related to JUICE that was automatically computed by Wikispeedia:

  1. CITRUS (6 times at distance 1 from JUICE)
  2. FRUIT (3 times at distance 2 from JUICE)
  3. LEMON (1 time at distance 1 from JUICE)
  4. VITAMIN (1 time at distance 3 from JUICE)
  5. VITAMIN C (3 times at distance 2 from JUICE)
  6. WINE (2 times at distance 3 from JUICE)
  7. FLORIDA (1 time at distance 2 from JUICE)
In fact there is more to it than merely measuring distances between articles along game traces, so in case you are interested in a more detailed description of my ideas, you can look at the paper linked above or just send me an email:

Last modified on January 21, 2014