Parallelizing stateful operators in a distributed stream processing system: How, should you and how much?

Please use this identifier to cite or link to this item: https://doi.org/10.1145/2335484.2335515

DC Field	Value
dc.title	Parallelizing stateful operators in a distributed stream processing system: How, should you and how much?
dc.contributor.author	Wu, S.
dc.contributor.author	Kumar, V.
dc.contributor.author	Wu, K.-L.
dc.contributor.author	Ooi, B.C.
dc.date.accessioned	2013-07-04T08:08:22Z
dc.date.available	2013-07-04T08:08:22Z
dc.date.issued	2012
dc.identifier.citation	Wu, S.,Kumar, V.,Wu, K.-L.,Ooi, B.C. (2012). Parallelizing stateful operators in a distributed stream processing system: How, should you and how much?. Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems, DEBS'12 : 278-289. ScholarBank@NUS Repository. <a href="https://doi.org/10.1145/2335484.2335515" target="_blank">https://doi.org/10.1145/2335484.2335515</a>
dc.identifier.isbn	9781450313155
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/40614
dc.description.abstract	We consider a distributed stream processing application, expressed as a data-flow graph with operators as vertices connected by streams and deployed over a cluster of compute nodes, where a small subset of the operators are often the performance bottlenecks for the entire application. In cases where a bottleneck operator is stateless, it is obvious that parallelization by splitting the incoming stream among multiple parallel operators deployed on different nodes can help improve performance. However, it is not so obvious when the bottleneck operator is stateful. In such a case, parallelization is much more challenging as it often requires a state sharing mechanism for the parallel operators. Moreover, it incurs additional overheads of required accesses by the parallel operators to shared state and synchronization constructs. In this paper, we propose a parallelization framework for stateful stream processing operators. The framework not only addresses issues related to the system model and support for operator parallelization, but also delves into the theoretical details that model the suitability of parallelization and the optimal degree of parallelism. We have implemented and evaluated our framework in the context of IBM's System S distributed stream processing middleware. While microbenchmarks are used to validate the proposed theoretical model, a parallelized implementation of a moving KNN application is used for the purpose of evaluation. Copyright © 2012 ACM.
dc.description.uri	http://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1145/2335484.2335515
dc.source	Scopus
dc.subject	Parallelization
dc.subject	Shared state
dc.subject	Stream processing
dc.type	Conference Paper
dc.contributor.department	COMPUTER SCIENCE
dc.description.doi	10.1145/2335484.2335515
dc.description.sourcetitle	Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems, DEBS'12
dc.description.page	278-289
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM