Configurable Meta-search for Integrating Web Public Access Catalogs Hou Ieong Ho and Jieh Hsiang Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
[email protected],
[email protected]
Abstract. A Web Public Access Catalog (WebPAC) is an important feature of modern libraries. In this paper we propose a meta-search method to provide users with simultaneous access to WebPACs of different libraries. Our method gives a librarian full freedom to select WebPACs to be incorporated in the service but requires no programming effort from the librarian’s side. At the core of our method is a meta-search engine which sends a query to incorporated WebPACs, receives results, and post-processes the query results into a uniform presentation format. To incorporate an existing WebPAC into our system, one needs to analyze the query interaction behavior between the WebPAC and the browser. This can be done by extracting the query parameters from a query and the subsequent query result web pages. We modeled and abstracted these interactions and defined the corresponding XML formats to capture the needed parameters from these web pages. The resulting XML pages will then be fed to the search engine which will automatically incorporate the designated WebPAC as part of its search. The advantage of our method is that the search engine does not need to be modified when new WebPACs are added. When adding a new WebPAC, the librarian only needs to analyze a few web pages to decide the parameters. Even this step can mostly be done automatically. To illustrate the effectiveness of our method, we have built a system, called MetaCat, that has incorporated the WebPACs of 26 major libraries in Taiwan. MetaCat can be accessed at http://MetaCat.ntu.edu.tw. This research is supported in part by the National Science Council of the NSC-93Republic of China under grant numbers NSC-94-2422-H-002-008 2213-E-002-039.
1 Introduction The most important common service provided by modern libraries is the Web Public Access Catalog (WebPAC). By using WebPAC, users can search a library’s catalog quickly via internet. However, try to find books from several libraries can be a painful experience. The user needs to visit the WebPAC of every intended library and issues the same query to each of them separately. If the user does not have a clear idea of the books that she is looking for, it can be another time-consuming experience to go E.A. Fox et al. (Eds.): ICADL 2005, LNCS 3815, pp. 317 – 322, 2005. © Springer-Verlag Berlin Heidelberg 2005
318
H.I. Ho and J. Hsiang
through the search results from those WebPACs. It is therefore reasonable to design an integrated search that can access several WebPACs simultaneously. This service can be achieved by either building a centralized union catalog (such as WorldCat of OCLC), establishing standard data exchange protocols (such as Z39.50 [1] or OAIPMH [2]), or using meta-search (see [3] as an example). In this paper we propose a new meta-search methodology that allows a librarian to build her library’s cross-WebPAC service without any programming effort. Our method involves a core search facility and an XML format that allows the incorporation of a WebPAC service by simply identifying parameters involved in queries. To demonstrate the effectiveness of our method, we have implemented such a service, called MetaCat, for the National Taiwan University Library. MetaCat currently incorporates the WebPACs of 26 major libraries of Taiwan. It is also a popular search tool provided by the NTU Library. In Section 2 of the paper we give the methodology of our configurable meta-search method. Section 3 describes the implementation of MetaCat. We conclude the paper with some discussion and future directions.
2 Methodology Meta-search for WebPACs is a mechanism that allows the users to access and search, via Web, WebPACs of different libraries from a single webpage in a uniform way. In a typical (single) WebPAC service of a library, a user issues a query such as a title or author, the system then searches through the catalog of the library and returns a list of books (if any) that match the query. If we treat the inner working of a WebPAC as a black box, then the query session described above can be regarded as a series of webpage exchanges through the http protocol. The query issued by the user is sent as a sequence of parameters, usually wrapped inside the control elements (buttons, checkboxes, radio buttons, menus, text input, file select, hidden controls, object controls, etc) [9] of the