This approach has several drawbacks: first of all, many of the pages are no longer existant or are outdated and do not contain the keyword anymore. This is due to the fact that "web searches" are not really done when the user issues a request but in advance, which can be several months ago.
Another drawback is the fact that the search engines are not capable
of performing "individualized searches". Users can normally only search
for the existence or non-existence of keywords; more complex searches cannot
be performed. If the user wants to find the best price for a flight to
Hawaii, and then check
for available hotels for the dates found (and maybe even book them),
much of it has to be done manually still. One hope is that standards like
XML will provide a solution for this problem but these would only define
"semantics" on web pages, not how to search in them.
Finally, additional information to control the search would be useful. For example, statistics on web page accesses or the number of references to a certain page, as used in the Google system could improve the quality of the retrieved data and could help filtering out unwanted information.
The development of the system described in this document aims at solving
these drawbacks by providing a new architecture for information storage
and retrieval based on agent-technology. The use of agents (we define agents
here simply as mobile objects that can query their environment in certain
ways and act
upon these queries) helps solving the drawbacks mentioned above: agents
perform searches immediately with only minimally outdated information due
to a hierarchical caching approach discussed below. Searches can be individualized
since the agents and their search strategy are programmed in Java.
As a sample application, the Information Personae project is presented.
This project allows personalized information storage and retrieval.