Query optimization for massively parallel data processing

Please use this identifier to cite or link to this item: https://doi.org/10.1145/2038916.2038928

DC Field	Value
dc.title	Query optimization for massively parallel data processing
dc.contributor.author	Wu, S.
dc.contributor.author	Li, F.
dc.contributor.author	Mehrotra, S.
dc.contributor.author	Ooi, B.C.
dc.date.accessioned	2013-07-04T08:41:24Z
dc.date.available	2013-07-04T08:41:24Z
dc.date.issued	2011
dc.identifier.citation	Wu, S.,Li, F.,Mehrotra, S.,Ooi, B.C. (2011). Query optimization for massively parallel data processing. Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011. ScholarBank@NUS Repository. <a href="https://doi.org/10.1145/2038916.2038928" target="_blank">https://doi.org/10.1145/2038916.2038928</a>
dc.identifier.isbn	9781450309769
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/42020
dc.description.abstract	MapReduce has been widely recognized as an efficient tool for large-scale data analysis. It achieves high performance by exploiting parallelism among processing nodes while providing a simple interface for upper-layer applications. Some vendors have enhanced their data warehouse systems by integrating MapReduce into the systems. However, existing MapReduce-based query processing systems, such as Hive, fall short of the query optimization and competency of conventional database systems. Given an SQL query, Hive translates the query into a set of MapReduce jobs sentence by sentence. This design assumes that the user can optimize his query before submitting it to the system. Unfortunately, manual query optimization is time consuming and difficult, even to an experienced database user or administrator. In this paper, we propose a query optimization scheme forMapReduce-based processing systems. Specifically, we embed into Hive a query optimizer which is designed to generate an efficient query plan based on our proposed cost model. Experiments carried out on our in-house cluster confirm the effectiveness of our query optimizer. Copyright 2011 ACM.
dc.description.uri	http://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1145/2038916.2038928
dc.source	Scopus
dc.subject	Hive
dc.subject	MapReduce
dc.subject	Multi-way join
dc.subject	Query optimization
dc.type	Conference Paper
dc.contributor.department	COMPUTER SCIENCE
dc.description.doi	10.1145/2038916.2038928
dc.description.sourcetitle	Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM