Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/40983
DC Field | Value | |
---|---|---|
dc.title | RRPJ: Result-rate based progressive relational join | |
dc.contributor.author | Tok, W.H. | |
dc.contributor.author | Bressan, S. | |
dc.contributor.author | Lee, M.-L. | |
dc.date.accessioned | 2013-07-04T08:16:53Z | |
dc.date.available | 2013-07-04T08:16:53Z | |
dc.date.issued | 2007 | |
dc.identifier.citation | Tok, W.H.,Bressan, S.,Lee, M.-L. (2007). RRPJ: Result-rate based progressive relational join. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4443 LNCS : 43-54. ScholarBank@NUS Repository. | |
dc.identifier.isbn | 9783540717027 | |
dc.identifier.issn | 03029743 | |
dc.identifier.uri | http://scholarbank.nus.edu.sg/handle/10635/40983 | |
dc.description.abstract | Progressive join algorithms are join algorithms that produce results incrementally as input data is available. Because they are non-blocking, they are particularly suitable for online processing of data streams. Reference algorithms of this family are the symmetric hash join, the X-join and more recently, the rate-based progressive join (RPJ). While the symmetric hash join introduces the idea of a symmetric processing of the input streams but assumes sufficient main memory, the X-Join suggests that the processing can scale to very large amounts of data if main memory is regularly flushed to disk, and a reactive/cleanup phase is triggered for disk-resident data. The X-join flushing strategy is based on a simple largest-first strategy, where the largest partition is flushed to disk. The recently proposed RPJ predicts the main memory tuples or partitions that should be flushed to disk in order to maximize throughput by computing their probabilities to contribute to a result. In this paper, we discuss the limitations of RPJ and propose a novel extension, called Result Rate-based Progressive Join (RRPJ), which addresses these limitations. Instead of computing the probabilities from statistics over the input data, RRPJ directly observes the output (result) statistics. This not only yields a better performance, but also simplifies the generalization of the algorithm to non-relational data such as multidimensional data and hierarchical data. We empirically show that RRPJ is effective and efficient and outperforms the state-of-art RPJ. We also investigate the relevance and performance of an adaptive version of these algorithms using amortization parameters. © Springer-Verlag Berlin Heidelberg 2007. | |
dc.source | Scopus | |
dc.subject | Data streams | |
dc.subject | Join algorithms | |
dc.subject | Query processing | |
dc.type | Conference Paper | |
dc.contributor.department | COMPUTER SCIENCE | |
dc.description.sourcetitle | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | |
dc.description.volume | 4443 LNCS | |
dc.description.page | 43-54 | |
dc.identifier.isiut | NOT_IN_WOS | |
Appears in Collections: | Staff Publications |
Show simple item record
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.