Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/29562
Title: Using semantics in XML query processing
Authors: WU HUAYU
Keywords: XML query processing, twig pattern query, semantic approach, content search, VERT, aggregation
Issue Date: 21-Mar-2011
Citation: WU HUAYU (2011-03-21). Using semantics in XML query processing. ScholarBank@NUS Repository.
Abstract: As more and more information is stored in XML format, how to query XML data efficiently becomes increasingly important. In this thesis, we try to make use of semantics information, e.g., value, property, object and relationship among objects, to improve the efficiency of XML query processing. We focus on matching a twig pattern, which is considered the core pattern of XML queries, to an XML tree. We also show that our approach can be extended to handle queries with ID references and queries across multiple twig patterns in one or multiple documents. The main idea of our research is to capture such semantic information as value, property, object and relationship among objects, and incorporate relational tables as indexes to reflect the semantic information. In the first part of this thesis, we propose a novel twig pattern matching algorithm VERT, which solves the problems regarding values in existing twig pattern matching algorithms. In VERT we model a twig pattern query as two parts, structural search and content search, and use property-based relational tables and inverted lists to perform two types of searches separately during query processing. We show that our approach not only handles the problems in value management and content search in other twig pattern matching approaches, but also improves query processing performance. Later, we propose three optimizations to further integrate object-based semantic information into the tables, to reduce the number of structural joins required to process a query. Furthermore, our approach can efficiently process general queries joining several twig patterns and queries with ID references. Finally, after twig pattern matching, VERT can return actual values, instead of node labels as in other twig pattern matching approaches. Based on VERT, we propose two extensions to twig pattern query to enhance its expressivity and to support grouping and aggregation in queries. The second part of the thesis studies the characteristics, i.e., the purpose , the optionality and the occurrence of query nodes in a twig pattern query, based on which the query nodes are classified into six types. We focus on output information, and propose the TP+Output to extend the existing twig pattern query to explicitly express each type of output nodes. Using TP+Output, a query with complex output information can be expressed by fewer tree-structured query patterns, compared to the number of query patterns in the original twig pattern query. By extending VERT to efficiently match TP+Output queries, naturally a query with a complex output can be solved by performing less structural joins than the exiting approaches using the original twig pattern query. In the third part of the thesis, we propose an algorithm to physically perform grouping and aggregation in XML queries. Existing twig pattern query processing approaches can hardly be extended to support grouping and aggregation, because they normally return node labels rather than actual values as result. In our approach, we model such a query by separating its core query pattern from the grouping and aggregation operations. We use VERT algorithm to match query patterns to documents first. Since VERT can return value answers directly using semantic tables, the matching result is ready for any post-processing, e.g., grouping and aggregation computing. Finally, we design a recursive method to analyze nested and parallel grouping operations in the query, and perform grouping and aggregation over the intermediate result returned by VERT. After all, this thesis theoretically and experimentally demonstrates that using semantic information to process XML queries one can gain a lot of benefit in terms of efficiency. This result should be useful for future research and applications in XML query processing.
URI: http://scholarbank.nus.edu.sg/handle/10635/29562
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
WuHY.pdf2.96 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.