Please use this identifier to cite or link to this item:
|Title:||The performance of mapreduce: An indepth study|
|Authors:||Jiang, D. |
|Source:||Jiang, D.,Ooi, B.C.,Shi, L.,Wu, S. (2010). The performance of mapreduce: An indepth study. Proceedings of the VLDB Endowment 3 (1) : 472-483. ScholarBank@NUS Repository.|
|Abstract:||MapReduce has been widely used for large-scale data analysis in the Cloud. The system is well recognized for its elastic scalability and fine-grained fault tolerance although its performance has been noted to be suboptimal in the database context. According to a recent study , Hadoop, an open source implementation of MapReduce, is slower than two state-of-the-art parallel database systems in performing a variety of analytical tasks by a factor of 3.1 to 6.5. MapReduce can achieve better performance with the allocation of more compute nodes from the cloud to speed up computation; however, this approach of "renting more nodes" is not cost effective in a pay-as-you-go environment. Users desire an economical elastically scalable data processing system, and therefore, are interested in whether MapReduce can offer both elastic scalability and effciency. In this paper, we conduct a performance study of MapReduce (Hadoop) on a 100-node cluster of Amazon EC2 with various levels of parallelism. We identify five design factors that affect the performance of Hadoop, and investigate alternative but known methods for each factor. We show that by carefully tuning these factors, the overall performance of Hadoop can be improved by a factor of 2.5 to 3.5 for the same benchmark used in , and is thus more comparable to that of parallel database systems. Our results show that it is therefore possible to build a cloud data processing system that is both elastically scalable and effcient. © 2010 VLDB Endowment.|
|Source Title:||Proceedings of the VLDB Endowment|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Dec 8, 2017
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.