Optimality and scalability in lattice histogram construction | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/39529

DC Field	Value
dc.title	Optimality and scalability in lattice histogram construction
dc.contributor.author	Karras, P.
dc.date.accessioned	2013-07-04T07:43:40Z
dc.date.available	2013-07-04T07:43:40Z
dc.date.issued	2009
dc.identifier.citation	Karras, P. (2009). Optimality and scalability in lattice histogram construction. Proceedings of the VLDB Endowment 2 (1) : 670-681. ScholarBank@NUS Repository.
dc.identifier.issn	21508097
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/39529
dc.description.abstract	The Lattice Histogram is a recently proposed data summarization technique that achieves approximation quality preferable to that of an optimal plain histogram. Like other hierarchical synopsis methods, a lattice histogram (LH) aims to approximate data using a hierarchical structure. Still, this structure is not defined a priori; it consists an unknown, not a given, of the problem. Past work has defined the properties that an LH needs to obey and developed general-purpose approximation algorithms for the construction thereof. Still, two major issues remain unaddressed: First, the construction of an optimal LH for a given error metric is a problem unsolved to date. Second, the proposed algorithms suffer from too high space and time complexities that render their application in real-world settings problematic. In this paper, we address both these questions, focusing on the case that the target error metric is a maximum error metric. Our algorithms treat both the error-bounded LH construction problem, in which the space occupied by an LH is minimized under an error constraint, as well as the classic space-bounded problem. First, we develop a dynamicprogramming scheme that detects an optimal LH under a given maximum-error bound. Second, we propose an efficient, practical, greedy algorithm that solves the same problem with much lower time and space requirements. Then, we show how both our algorithms can be applied to the classic space-bounded problem, aiming at minimizing error under a bound on space. Our experimental study with real-world data sets shows the effectiveness of our methods compared to competing summarization techniques. Moreover, our findings show that our greedy heuristic performs almost as well as the optimal solution in terms of accuracy. © 2009 VLDB Endowment.
dc.source	Scopus
dc.type	Article
dc.contributor.department	COMPUTER SCIENCE
dc.description.sourcetitle	Proceedings of the VLDB Endowment
dc.description.volume	2
dc.description.issue	1
dc.description.page	670-681
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.