Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/77892
Title: | Multiquery optimization in mapreduce framework | Authors: | Wang, G. Chan, C.Y. |
Issue Date: | Nov-2013 | Citation: | Wang, G.,Chan, C.Y. (2013-11). Multiquery optimization in mapreduce framework. Proceedings of the VLDB Endowment 7 (3) : 145-156. ScholarBank@NUS Repository. | Abstract: | MapReduce has recently emerged as a new paradigm for large-scale data analysis due to its high scalability, fine-grained fault tolerance and easy programming model. Since different jobs often share similar work (e.g., several jobs s-can the same input file or produce the same map output), there are many opportunities to optimize the performance for a batch of jobs. In this paper, we propose two new tech-niques for multi-job optimization in the MapReduce frame-work. The first is a generalized grouping technique (which generalizes the recently proposed MRShare technique) that merges multiple jobs into a single job thereby enabling the merged jobs to share both the scan of the input file as well as the communication of the common map output. The sec-ond is a materialization technique that enables multiple jobs to share both the scan of the input file as well as the com-munication of the common map output via partial material-ization of the map output of some jobs (in the map and/or reduce phase). Our second contribution is the proposal of a new optimization algorithm that given an input batch of jobs, produces an optimal plan by a judicious partitioning of the jobs into groups and an optimal assignment of the pro-cessing technique to each group. Our experimental results on Hadoop demonstrate that our new approach significantly outperforms the state-of-the-art technique, MRShare, by up to 107%. © 2013 VLDB Endowment. | Source Title: | Proceedings of the VLDB Endowment | URI: | http://scholarbank.nus.edu.sg/handle/10635/77892 | ISSN: | 21508097 |
Appears in Collections: | Staff Publications |
Show full item record
Files in This Item:
There are no files associated with this item.
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.