Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/169226
Title: QUERY OPTIMIZATION IN LOCALLY DISTRIBUTED DATABASE SYSTEMS
Authors: TAI CHIEW LAN
Issue Date: 1991
Citation: TAI CHIEW LAN (1991). QUERY OPTIMIZATION IN LOCALLY DISTRIBUTED DATABASE SYSTEMS. ScholarBank@NUS Repository.
Abstract: In this dissertation, a new query processing algorithm for locally distributed database systems is proposed. Experimental results obtained by earlier researchers have shown that communication cost is relatively insignificant compared to local processing cost in locally distributed database systems. Based on these results, several heuristics that simplify the process of query optimization in locally distributed environment are derived and a two-phase query optimization approach is proposed based on the derived heuristics. With such an approach, the problem of optimizing a multi-site query is separated into two phases : a plan generation phase and a site selection phase. In the plan generation phase, a logical processing plan is generated for a database query without considering the distribution of the data. The logical plan includes information such as the join order, join methods and access paths to be used to process the query. The actual processing sites and distributed join strategies are detem1ined in the site selection phase. A detailed site selection algorithm which considers various transfer strategies and join strategies when extending the logical plan into a distributed plan is given. In order to evaluate the performance of the proposed two-phase optimization algorithm, a simulation study was conducted to compare the plans obtained using this algorithm and those obtained using the exhaustive optimization algorithm of System R *. A simulated query planner that adopts the two-phase algorithm and another planner similar to that in System R* were both implemented on AT&T 3B4000 computer using the C language. Several sets of test queries were generated to investigate the effects of various factors on the performance of the algorithm. Among the factors investigated are the relation size, join selectivity, communication speed, the number of relations referenced by the query and the number of referenced relations available at each site, The simulation results show that for locally distributed database systems where the communication cost is relatively insignificant compared with the disk VO and CPU cost, the two-phase algorithm produces plans whose cost is very close to that of those produced by the exhaustive algorithm. An obvious advantage of such an approach is that the optimization process becomes simpler. The elimination of the data distribution parameter and thus the transfer strategy as well as the distributed join method in the first phase reduces the solution space considerably. The second phase requires O (nm) time to generate a distributed plan for an n -way join query in a distributed system with m sites. The separation of the generation of the logical plan and the generation of the global distributed plan also makes it possible to move the first phase to compilation time so as to reduce the runtime overhead of query optimization. The second phase can be optimized dynamically at execution time; the availability of current system load information allows load balancing techniques to be incorporated into query optimization to further improve the overall system performance.
URI: https://scholarbank.nus.edu.sg/handle/10635/169226
Appears in Collections:Master's Theses (Restricted)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
b1759666x.pdf3.17 MBAdobe PDF

RESTRICTED

NoneLog In

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.