Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/77892
Title: Multiquery optimization in mapreduce framework
Authors: Wang, G.
Chan, C.Y. 
Issue Date: Nov-2013
Citation: Wang, G.,Chan, C.Y. (2013-11). Multiquery optimization in mapreduce framework. Proceedings of the VLDB Endowment 7 (3) : 145-156. ScholarBank@NUS Repository.
Abstract: MapReduce has recently emerged as a new paradigm for large-scale data analysis due to its high scalability, fine-grained fault tolerance and easy programming model. Since different jobs often share similar work (e.g., several jobs s-can the same input file or produce the same map output), there are many opportunities to optimize the performance for a batch of jobs. In this paper, we propose two new tech-niques for multi-job optimization in the MapReduce frame-work. The first is a generalized grouping technique (which generalizes the recently proposed MRShare technique) that merges multiple jobs into a single job thereby enabling the merged jobs to share both the scan of the input file as well as the communication of the common map output. The sec-ond is a materialization technique that enables multiple jobs to share both the scan of the input file as well as the com-munication of the common map output via partial material-ization of the map output of some jobs (in the map and/or reduce phase). Our second contribution is the proposal of a new optimization algorithm that given an input batch of jobs, produces an optimal plan by a judicious partitioning of the jobs into groups and an optimal assignment of the pro-cessing technique to each group. Our experimental results on Hadoop demonstrate that our new approach significantly outperforms the state-of-the-art technique, MRShare, by up to 107%. © 2013 VLDB Endowment.
Source Title: Proceedings of the VLDB Endowment
URI: http://scholarbank.nus.edu.sg/handle/10635/77892
ISSN: 21508097
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.