Please use this identifier to cite or link to this item:
|Title:||Dataflow detection and applications to workflow scheduling|
|Authors:||Wang, Y. |
|Source:||Wang, Y., Lu, P. (2011-08-10). Dataflow detection and applications to workflow scheduling. Concurrency Computation Practice and Experience 23 (11) : 1261-1283. ScholarBank@NUS Repository. https://doi.org/10.1002/cpe.1708|
|Abstract:||In high-performance computing (HPC)textitworkloads (i.e. the set of computations to be completed), the same computational workflow of jobs (e.g. a Pipeline, a Fork&Join, or a Lattice graph) may be applied to different input files and parameters. Each of these workflow instances has the same workflow shape, but accesses (possibly) separate input, intermediate, and output files. Therefore, the selective isolation of each workflow instance can be important for maximizing scheduling flexibility and performance. However, in practice, realizing this benefit is not obvious due to a variety of problems and constraints. For example, the unmediated interaction of different workflow instances can lead to a problem of filename conflicts between concurrent workflow instances overwriting common files, which, for a control-flow driven batch scheduler, may result in either unsafe computation of the multiple instances in the same sub-directory or storage overheads when multiple directories are used. We propose a novel approach of selectively coupling and integrating job schedulers and file systems, known as a Workflow-aware File System (WaFS), with two major benefits. First, separate namespaces can be constructed on a per-instance basis to maximize the concurrency of workflow instances, despite filename conflicts, while minimizing storage overhead. Second, exploiting inferred dataflow information, trade-offs can be made between makespan and storage overhead while maintaining correctness. Through a simulation-based study, we have shown the potential benefits of WaFS to job concurrency and we have characterized the trade-offs that can be made between storage overhead and performance. New scheduling policies, Versioned Namespace (VNS), Overwrite-Safe Concurrency (OSC) and hybrids, are made possible by WaFS, with different advantages and disadvantages. © 2011 John Wiley & Sons, Ltd.|
|Source Title:||Concurrency Computation Practice and Experience|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Dec 6, 2017
WEB OF SCIENCETM
checked on Nov 21, 2017
checked on Dec 10, 2017
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.