Please use this identifier to cite or link to this item:
Title: Processing of an iceberg query on distributed and centralized databases
Keywords: Iceberg query, Distributed database, Iceberg semijoin
Issue Date: 2-Sep-2004
Citation: MOHAMMED KASIM IMTHIYAZ (2004-09-02). Processing of an iceberg query on distributed and centralized databases. ScholarBank@NUS Repository.
Abstract: Many practical applications, such as Data Warehousing, Market-basket analysis and Information Retrieval, rely on Iceberg queries. Such queries compute aggregate functions over an attribute or a set of attributes to find aggregate values above some specified threshold, and return the qualified tuples. They are called Iceberg queries, because the result is usually very small (i.e., the tip of an iceberg) compared to the large amount of input set (the iceberg). Nowadays these queries are computed over the distributed and centralized databases. In a distributed system, data is distributed to various services based on theme and/or location. It is a collection of sites connected on a common network. This distributed model is cheap, efficient, easily maintainable and steers clear of integration constraints like legacy constraints. The query processing in this model requires retrieval and transmission of data from different sites through the network, which is not only expensive, but also causes time delays. On the other hand, for the centralized databases, the storage space and time are major constraints. In this thesis we intend to provide query optimization frameworks for processing Iceberg queries on both models. Our intension is to reduce the transfer cost and storage space, time in distributed and centralized systems respectively. We apply Bloom filters in distributed system to save the transfer cost, and coalescing technique in centralized system to save the storage space.Multiple Filter Iceberg SemiJoin (MulFIS) is the first algorithm we proposed for the distributed system. Consider an iceberg semijoin query. a??Find all products in R which are at least sold T items in Sa??. R and S are stored in two remote servers R and S. MulFIS use multiple Bloom filters. It first use the Bloom filter of R to eliminate the non-joining tuples during the execution of iceberg query in S (finding the items sold greater than or equal to T is called iceberg query), also it can access the internal hash tables generated to process the iceberg query; these are used to construct accurate Bloom filters. These filters are exchanged between the processing sites enabling the elimination of unnecessary tuples at the early stages. MulFIS interleaves the execution of joining and iceberg query in the remote servers. Our results revealed that the network cost of distributed iceberg query is reduced by 80% compared to other naive methods.For the centralized database we proposed an algorithm Bottom-up Computation of Dwarf. We compute the data cube in a bottom-up manner by starting from the cuboid with a??ALLa??; to a cuboid on a single dimension; then on a pair of dimensions and so on. The algorithm partition the first dimension based on its cardinalities, and for each partition the algorithm recursively computes the cuboids of remaining dimensions. It switches to next partition, once it has computed all cuboids for the current partition. For sparse data cubes we expect more number of prefix and suffix redundancies in each partition. The algorithm coalesce the storage space of redundant tuples in each partition, in order to save the storage space and computation time. The result shows that the algorithm created a data cube by saving considerable amount of storage space and time.
Appears in Collections:Master's Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
Imthiyaz-HT016566N.pdf300.86 kBAdobe PDF



Page view(s)

checked on Apr 20, 2019


checked on Apr 20, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.