Web 2.0

Wikimania 2006

Wikimania 2006, a conference for Wikimedia and Wikipedia people, fans, and advocates, finished up today. I attended as a visitor, to just see the seminars and sessions and soak up enthusiasm about wikis in general.

A separate event held before Wikimania, the Wikimania Hacking Days, had many of MediaWiki and Wikipedia developers come to discuss future directions of the infrastructure and software architectural of Wikiedia. Even though it was held at the offices where I work, the OLPC, I did not attend any of the seminars or hacking sessions. Most were heavily focused on MediaWiki, which I can honestly say I do not like much: the Wiki syntax is awful, and it is slow (I think Wikipedia is the fastest MediaWiki-powered site I know of).

Some of the interesting stuff I liked at Wikimania:

  • Chuck Smith's Wiki Markup Mess poster detailed the many different types of Wiki markup in use, and put forth a "standard" Wiki markup to be adopted by all. I personally think this is the way standards should be made, that is, after-the-fact based on things that are already working in the wild. Interesting enough, ErfurtWiki, which I used on my old website, supports the syntax unification they were proposing.
  • Lawrence Lessig's Ethics of a Free Culture Movement talk was excellent. While the presentation he used was a little corny, it detracted nothing from his message: copyright law has stinted the culture of the last 100 yrs, and new laws are needed for the new culture of the next 100 yrs
  • Markus Krötzsch's Semantic MediaWiki extension, demonstrated as part of the Wikipedia and the Semantic Web panel, was very interesting to me. Lack of structure to information in wikis is a pet peeve with me; semantically tagging bits of information so they can be pulled out from articles with automated tools is just cool.

Amazon A9's siteinfo.xml: almost a repeat of favicon.ico

Recently, I've received a few error 404s on a request for "siteinfo.xml." siteinfo.xml is a file used by Amazon's A9 search engine's browser toolbar SiteInfo, and is automatically fetched for every website a user visits.

This sounds pretty similar to Microsoft's Internet Explorer's infamous favorites icons feature. For every site a user visited with Internet Explorer, the browser would automatically request a file called favicon.ico, to be displayed in the browser's location bar and bookmarks. A lot of people were not happy--all of the sudden web servers would begin to get swamped for requests for this mysterious favicon.ico that did not exist. These requests polluted many web server logs, and were very annoying.

On some sites, especially dynamic ones, 404 errors are very expensive. Unfortunately this is true of most Drupal-powered sites, including mine. When using Drupal's "pretty URLs" which uses Apache's mod_rewrite to, well, make URLs pretty, all requests that the web server does not process (including errors) will go through Drupal. Going through Drupal means a long boot-strapping process to initialize Drupal and load all its modules, and at least one database request to find out a URL does not exist and to return an error 404. Too many requests for a non-existent file can basically become a DoS attack.

It seems Amazon's A9 developers didn't get the memo people don't like tools that request files that don't exist.

Granted, it's not too bad: I don't think this toolbar has much market penetration, so it's not as if millions of people are killing my site. The siteinfo.xml specification page also mentions that the file is fetched through A9 and cached, so the file will not be requested for every user that visits, but only for the first one.

Kudos for Amazon's programmers being a bit brighter than Microsoft's, but eh, I can't say how much more bright for designing a system that is a bit too similar to the favicon.ico debacle.

Syndicate content