Please use this identifier to cite or link to this item: https://doi.org/10.1002/cpe.1708
Title: Dataflow detection and applications to workflow scheduling
Authors: Wang, Y. 
Lu, P.
Keywords: concurrency
dataflow
storage
Issue Date: 10-Aug-2011
Source: Wang, Y., Lu, P. (2011-08-10). Dataflow detection and applications to workflow scheduling. Concurrency Computation Practice and Experience 23 (11) : 1261-1283. ScholarBank@NUS Repository. https://doi.org/10.1002/cpe.1708
Abstract: In high-performance computing (HPC)textitworkloads (i.e. the set of computations to be completed), the same computational workflow of jobs (e.g. a Pipeline, a Fork&Join, or a Lattice graph) may be applied to different input files and parameters. Each of these workflow instances has the same workflow shape, but accesses (possibly) separate input, intermediate, and output files. Therefore, the selective isolation of each workflow instance can be important for maximizing scheduling flexibility and performance. However, in practice, realizing this benefit is not obvious due to a variety of problems and constraints. For example, the unmediated interaction of different workflow instances can lead to a problem of filename conflicts between concurrent workflow instances overwriting common files, which, for a control-flow driven batch scheduler, may result in either unsafe computation of the multiple instances in the same sub-directory or storage overheads when multiple directories are used. We propose a novel approach of selectively coupling and integrating job schedulers and file systems, known as a Workflow-aware File System (WaFS), with two major benefits. First, separate namespaces can be constructed on a per-instance basis to maximize the concurrency of workflow instances, despite filename conflicts, while minimizing storage overhead. Second, exploiting inferred dataflow information, trade-offs can be made between makespan and storage overhead while maintaining correctness. Through a simulation-based study, we have shown the potential benefits of WaFS to job concurrency and we have characterized the trade-offs that can be made between storage overhead and performance. New scheduling policies, Versioned Namespace (VNS), Overwrite-Safe Concurrency (OSC) and hybrids, are made possible by WaFS, with different advantages and disadvantages. © 2011 John Wiley & Sons, Ltd.
Source Title: Concurrency Computation Practice and Experience
URI: http://scholarbank.nus.edu.sg/handle/10635/55496
ISSN: 15320626
DOI: 10.1002/cpe.1708
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

10
checked on Dec 6, 2017

WEB OF SCIENCETM
Citations

8
checked on Nov 21, 2017

Page view(s)

29
checked on Dec 10, 2017

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.