Please use this identifier to cite or link to this item: https://doi.org/10.1145/2038916.2038928
DC FieldValue
dc.titleQuery optimization for massively parallel data processing
dc.contributor.authorWu, S.
dc.contributor.authorLi, F.
dc.contributor.authorMehrotra, S.
dc.contributor.authorOoi, B.C.
dc.date.accessioned2013-07-04T08:41:24Z
dc.date.available2013-07-04T08:41:24Z
dc.date.issued2011
dc.identifier.citationWu, S.,Li, F.,Mehrotra, S.,Ooi, B.C. (2011). Query optimization for massively parallel data processing. Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011. ScholarBank@NUS Repository. <a href="https://doi.org/10.1145/2038916.2038928" target="_blank">https://doi.org/10.1145/2038916.2038928</a>
dc.identifier.isbn9781450309769
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/42020
dc.description.abstractMapReduce has been widely recognized as an efficient tool for large-scale data analysis. It achieves high performance by exploiting parallelism among processing nodes while providing a simple interface for upper-layer applications. Some vendors have enhanced their data warehouse systems by integrating MapReduce into the systems. However, existing MapReduce-based query processing systems, such as Hive, fall short of the query optimization and competency of conventional database systems. Given an SQL query, Hive translates the query into a set of MapReduce jobs sentence by sentence. This design assumes that the user can optimize his query before submitting it to the system. Unfortunately, manual query optimization is time consuming and difficult, even to an experienced database user or administrator. In this paper, we propose a query optimization scheme forMapReduce-based processing systems. Specifically, we embed into Hive a query optimizer which is designed to generate an efficient query plan based on our proposed cost model. Experiments carried out on our in-house cluster confirm the effectiveness of our query optimizer. Copyright 2011 ACM.
dc.description.urihttp://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1145/2038916.2038928
dc.sourceScopus
dc.subjectHive
dc.subjectMapReduce
dc.subjectMulti-way join
dc.subjectQuery optimization
dc.typeConference Paper
dc.contributor.departmentCOMPUTER SCIENCE
dc.description.doi10.1145/2038916.2038928
dc.description.sourcetitleProceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011
dc.identifier.isiutNOT_IN_WOS
Appears in Collections:Staff Publications

Show simple item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.