SECTION II: BUILDING MANY WORLDS
Chapter 5 - Information Overload: Database Aesthetics
5.9 Archiving the Internet
5.9.1 A fierce competitor to Corbis is Brewster Kahle, a thirty seven year old programmer and entrepreneur who has been capturing and archiving every public Web page since 1996. His ambitious archival project of digital data is to create the Internet equivalent of the Library of Congress. Kahle's non-profit Internet Archive serves as a historical record of cyberspace. His for-profit company, Alexa Internet, named after the Library of Alexandria, uses this archive as part of an innovative search tool that lets users call up "out-of-print" Web pages. Along with the actual pages, the programs retrieve and store "metadata" as well‹information about each site such as how many people visited it, where on the Web they went next, and what pages are linked to it. The Web pages are stored digitally on a "jukebox" tape drive the size of two soda machines, which contains ten terabytes of data‹as much information as half of the Library of Congress. And in keeping with the Library of Congress, the Internet Archive does not exclude information because it is trivial, dull, or seemingly unimportant. What separates Alexa from other search engines is that it lets users view sites that have been removed from the Web. Browser and search companies are currently busy snapping up technology that improves Web navigation. Lycos, for instance, spent 39.75 million dollars for WiseWire, which automatically organises Internet content into directories and categories. In April, 1998, Microsoft shelled out a reported 40 million dollars for Firefly, developed by Pattie Maes at MIT, which recommends content to Websurfers based on profiles they submit. (Said, 1998, pg. B3.) When they encounter the message "404 Document Not Found," users can click on the Alexa toolbar to fetch the out-of-print Web page from the Internet Archive.
5.9.2 Kahle justifiably worries about moves for laws to be instituted that would make Internet archiving illegal. His efforts to archive the WWW implicitly addresses the fact that archiving for non-print materials is far more problematic in terms of cultural practice and focus than print materials. A good example is the documentation and preservation of television, which, in contrast to print archiving that has been a cultural priority at least since the Library of Alexandria, has relatively few archives preserved and those by relatively inaccessible places such as the Museum of Broadcasting. Although television has functioned as a premier cultural artefact of the latter half of this century, it is only now that it faces radical change that it is finally becoming clear that a lot of our heritage is in electronic form and should be well preserved as such. Even more dire perhaps is the cultural position of video art, which is fast deteriorating with no funds being allocated towards preservation and digitisation of work from the late 60's and early 70's. The work of digitisation of our collective knowledge is selective after all and seems to lean in the direction of documenting the present and not necessarily preserving the past. [top]