Please use this identifier to cite or link to this item:
Title: Query processing in peer-to-peer based data management system
Authors: WU SAI
Keywords: P2P, BATON, Query Processing, Indexing, Query Optimization, Aggregation
Issue Date: 4-Mar-2011
Citation: WU SAI (2011-03-04). Query processing in peer-to-peer based data management system. ScholarBank@NUS Repository.
Abstract: <div> In last ten years, we have witnessed the success of P2P (Peer-to-Peer) network. It facilitates the information sharing to an unprecedented scale. Some popular applications, such as Skype and Emule, are deployed to serve millions of users. Although well recognised for its scalability, current P2P network lacks of state-of-art data management system, especially for enterprise applications. To address this problem, database community attempts to integrate database technologies into P2P networks and various PDMSs (Peer-based Data Management Systems) are proposed. In this thesis, we design an efficient processing framework for the PDMS. The framework consists of a query optimizer and three processing approaches tailored for different types of queries. <ul> <li> For simple OLTP queries, the optimizer applies the distributed index to process it. To reduce the maintenance cost of indexes, we propose a just-in-time indexing approach. Instead of indexing the whole dataset, we selectively publish the data based on the query patterns. </li> <li> For multi-way join queries, the optimizer adopts an adaptive join strategy. It first generates an initial query plan based on the distributed histograms. Since the histograms only provide a coarse estimation, the optimizer will periodically adjust the plan by exploiting the real-time query results. </li> <li> When a small amount of inaccuracy can be tolerated, the optimizer switches to an approximate OLAP query processing algorithm. The algorithm continuously retrieves random samples from PDMS. And approximate results are generated and refined based on the samples. </li> </ul> The query optimizer select the corresponding processing scheme and exploits the distributed histograms to optimize the query plan. The proposed approaches in this thesis are evaluated on a real distributed platform, PlanetLab. We used TPC-H queries and dataset in our benchmark. </div>
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
WuS.pdf1.31 MBAdobe PDF



Page view(s)

checked on Feb 3, 2019


checked on Feb 3, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.