Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/22134
Title: Towards an effective processing of XML keyword query
Authors: BAO ZHIFENG
Keywords: XML, Keyword search
Issue Date: 29-Nov-2010
Citation: BAO ZHIFENG (2010-11-29). Towards an effective processing of XML keyword query. ScholarBank@NUS Repository.
Abstract: Inspired by the great success of information retrieval (IR) style keyword search on the web, keyword search over XML data has emerged recently. As compared to keyword search on the web, XML keyword search brings several new challenges. (1) The target that a user query intends to search for is usually unknown or implicit. (2) The keyword ambiguity problem: a keyword can appear as both a tag name and a text value of some node; a keyword can appear as the text values of different XML node types and carry different meanings; a keyword can appear as the tag name of different XML node types with different meanings. It further obstructs identifying the constraints that a user query intends to search via. (3) The hierarchical structure of XML data has to be taken into account in devising the matching semantics and result ranking scheme. This dissertation discusses three aspects in the construction of an effective XML keyword search engine while conquering the above challenges. First, we study the keyword search over XML data tree without ID references captured. In particular, we propose a statistics-based approach to identify the target(s) that a user query intends to search for, quantify the likeliness of different search intentions in result ranking, and end with designing an XML Term Frequency * Inverse Document Frequency (XML TF*IDF) result ranking scheme. Second, we realize that by taking the ID references among elements in XML data into consideration, more relevant results can be found. Through identifying the objects of interest from the given semantic information of XML data, we model XML data as a set of object trees that are interconnected by either containment or reference edges, and propose a series of matching semantics at object tree level. As a result, user?s search concern on real-world objects can be precisely captured; by distinguishing the containment and reference edge in XML data, the efficiency of matching result generation is improved as compared to previous works on keyword search over general directed graph. Third, we observe that user queries may contain irrelevant or mismatched terms, typos etc, which may easily lead to nonsensical or empty result. An effective query refinement is a demanding functionality of an XML keyword search engine. Specifically, we propose a novel query ranking model to quantify the confidence of a refined query (RQ) candidate, which can capture the morphological/ semantical similarity between Q and RQ and the dependency of keywords of RQ over the XML data. Besides, we integrate the job of looking for RQ candidates and generating their matching results as a single problem, thus guaranteeing the existence of meaningful matching results of the suggested RQs. As a result, by incorporating the above proposed techniques, a keyword search engine prototype have been built. Through a comprehensive experimental study on both the real-life and synthetic data set, the proposed solutions are shown to be efficient, effective and scalable.
URI: http://scholarbank.nus.edu.sg/handle/10635/22134
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
BAOZF.pdf1.37 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.