Please use this identifier to cite or link to this item:
Title: S 3 : AAn efficient shared scan scheduler on MapReduce framework
Authors: Shi, L.
Li, X.
Tan, K.-L. 
Issue Date: 2011
Citation: Shi, L.,Li, X.,Tan, K.-L. (2011). S 3 : AAn efficient shared scan scheduler on MapReduce framework. Proceedings of the International Conference on Parallel Processing : 325-334. ScholarBank@NUS Repository.
Abstract: Hadoop, an open-source implementation of Map-Reduce, has been widely used for data-intensive computing. In order to improve performance, multiple jobs operating on a common data file can be processed as a batch to eliminate redundant scanning. However, in practice, jobs often do not arrive at the same time, and batching them means longer waiting time for jobs that arrive earlier. In this paper, we propose S 3 - a novel Shared Scan Scheduler for Hadoop - which allows sharing the scan of a common file for multiple jobs that may arrive at different time. Under S 3, a job is split into a sequence of (independent) sub-jobs, each operating on a different portion of the data file; moreover, multiple sub-jobs (from different jobs) that access a common portion of a data file can be processed as a batch to share the scan of the accessed data. S 3 operates as follows: at any time, the system may be processing a batch of sub-jobs (that access the same portion of data); at the same time, there are sub-jobs waiting in a job queue; as a new job arrives, its sub-jobs can be aligned with the waiting jobs in the queue; once the current batch of subjobs completes processing, the next batch of sub-jobs (which may include sub-jobs from newly arrived jobs) can be initiated for processing. In this way, an arriving job does not need to wait for a long time to be processed. We have implemented our S 3 approach in Hadoop, and our experimental results on a cluster of over 40 nodes show that S 3 outperforms the naïve no-sharing scheme and the file-based shared-scan approach. © 2011 IEEE.
Source Title: Proceedings of the International Conference on Parallel Processing
ISBN: 9780769545103
ISSN: 01903918
DOI: 10.1109/ICPP.2011.42
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.


checked on Apr 14, 2021

Page view(s)

checked on Apr 14, 2021

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.