A signature-based information retrieval system for office use

Pao Yue-kong Library Electronic Theses Database

A signature-based information retrieval system for office use

 

Author: Chan, Shui-kuen
Title: A signature-based information retrieval system for office use
Degree: M.Sc.
Year: 1998
Subject: Text processing (Computer science)
Information storage and retrieval systems
Chinese language -- Data processing
Electronic data processing
Office practice -- Automation
Hong Kong Polytechnic University -- Dissertations
Department: Multi-disciplinary Studies
Dept. of Computing
Pages: iv, 98 leaves : ill. ; 30 cm
Language: English
InnoPac Record: http://library.polyu.edu.hk/record=b1436943
URI: http://theses.lib.polyu.edu.hk/handle/200/448
Abstract: Office documents are produced everyday. Manual filing and searching become labor intensive and slow. Information retrieval offers the timely access of a large set of documents. In Hong Kong, documents are usually written in English and Chinese. This dissertation aims to extend the use of variable bit-block compression signature method, which is good to install for office use, to index and to search English/Chinese documents. In our proposed system, signatures for each documents are generated through a batch process including three sub-processes. First, a file list containing all accessible text files and their relevant information is generated. Secondly, by scanning the file list, a library file containing all terms in all accessible text files is created. Thirdly, by scanning the file list and the library file, a signature file containing all signatures for all accessible text files is generated. After signature generation, office staff can retrieve their relevant documents by submitting a query in the web page. The query is passed to a signature retrieval program through the Common Gateway Interface (CGI) specification. The signature retrieval program scans through the signature file and returns relevant documents through another web page. Besides, queries can be written as boolean expression based on conjunction and disjunction. Based on a small test queries (~= 30 cases), the average recall and precision are 1 and 0.933 respectively.

Files in this item

Files Size Format
b14369436.pdf 2.653Mb PDF
Copyright Undertaking
As a bona fide Library user, I declare that:
  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

     

Quick Search

Browse

More Information