Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/182364
Title: SIGNATURE FILES AND THEIR APPLICATIONS IN INFORMATION RETRIEVAL
Authors: ZHANG HUA
Issue Date: 1996
Citation: ZHANG HUA (1996). SIGNATURE FILES AND THEIR APPLICATIONS IN INFORMATION RETRIEVAL. ScholarBank@NUS Repository.
Abstract: Information Retrieval (IR) is a subfield of computer science applications that deal with automated storage and retrieval of documents. It has been widely used nowadays and has an even brighter future than ever before. File structures used in an information retrieval system play a very important role in the whole system because it is one of the main factors influencing retrieval speed and storage size of an IR system. That is the impetus for our research in this field. The first part of the work is to design a Modified Inverted List Scheme (MILS) for uncertainty retrieval. The essentials of uncertainty retrieval on document or text is a substring match problem. Traditional text retrieval has usually focused on a single substring match only, however, uncertainty text retrieval studied in this thesis goes beyond this point. We propose our MILS to support efficiently processing uncertainty retrieval, and, an easy-to-use language prototype to facilitate users to issue their uncertainty queries is also described in this part. We also present a rough combined scheme applying signature file to improve our MlLS structure. The second part is to design a signature file structure for partial match retrieval. A partial match query is a query in which some of attributes arc unspecified. In recent years, a new class of file structures, which combines a multikey hashing scheme and a signature file technique, has been proposed for partial match retrieval. However, several important design issues on the signature file for partial match retrieval arc not addressed. One of the major issues is on the optimal assignment of signature file into buckets. An interesting result found in this thesis work is that the corresponding relationship between the attributes has a great influence on designing a good signature file. In addition, we also propose an attribute partitioning signature file for partial match retrieval. In the third part of this thesis, we study signature false drops due to combinatorial error. Multi-attribute hashing is a scheme suitable for partial match retrieval. Signature files can be used with multi-attribute hashing to avoid many of the unnecessary searches. One of the serious problems when applying signature files to partial match retrieval is the combinatorial error caused thereby. We present an analytic method which estimates the false drop probability caused by the combinatorial errors and justify our method by simulations. In addition, a detailed analysis is also given in this part. As the last part of the work, we discuss the areas of future research work in this field and present some of preliminary ideas as well.
URI: https://scholarbank.nus.edu.sg/handle/10635/182364
Appears in Collections:Master's Theses (Restricted)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
B20097888.PDF2.88 MBAdobe PDF

RESTRICTED

NoneLog In

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.