Please use this identifier to cite or link to this item:
|Title:||Multiquery optimization in mapreduce framework|
|Citation:||Wang, G.,Chan, C.Y. (2013-11). Multiquery optimization in mapreduce framework. Proceedings of the VLDB Endowment 7 (3) : 145-156. ScholarBank@NUS Repository.|
|Abstract:||MapReduce has recently emerged as a new paradigm for large-scale data analysis due to its high scalability, fine-grained fault tolerance and easy programming model. Since different jobs often share similar work (e.g., several jobs s-can the same input file or produce the same map output), there are many opportunities to optimize the performance for a batch of jobs. In this paper, we propose two new tech-niques for multi-job optimization in the MapReduce frame-work. The first is a generalized grouping technique (which generalizes the recently proposed MRShare technique) that merges multiple jobs into a single job thereby enabling the merged jobs to share both the scan of the input file as well as the communication of the common map output. The sec-ond is a materialization technique that enables multiple jobs to share both the scan of the input file as well as the com-munication of the common map output via partial material-ization of the map output of some jobs (in the map and/or reduce phase). Our second contribution is the proposal of a new optimization algorithm that given an input batch of jobs, produces an optimal plan by a judicious partitioning of the jobs into groups and an optimal assignment of the pro-cessing technique to each group. Our experimental results on Hadoop demonstrate that our new approach significantly outperforms the state-of-the-art technique, MRShare, by up to 107%. © 2013 VLDB Endowment.|
|Source Title:||Proceedings of the VLDB Endowment|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Aug 17, 2018
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.