Llama: Leveraging columnar storage for scalable join processing in the MapReduce framework

Please use this identifier to cite or link to this item: https://doi.org/10.1145/1989323.1989424

DC Field	Value
dc.title	Llama: Leveraging columnar storage for scalable join processing in the MapReduce framework
dc.contributor.author	Lin, Y.
dc.contributor.author	Agrawal, D.
dc.contributor.author	Chen, C.
dc.contributor.author	Ooi, B.C.
dc.contributor.author	Wu, S.
dc.date.accessioned	2013-07-04T08:06:59Z
dc.date.available	2013-07-04T08:06:59Z
dc.date.issued	2011
dc.identifier.citation	Lin, Y.,Agrawal, D.,Chen, C.,Ooi, B.C.,Wu, S. (2011). Llama: Leveraging columnar storage for scalable join processing in the MapReduce framework. Proceedings of the ACM SIGMOD International Conference on Management of Data : 961-972. ScholarBank@NUS Repository. <a href="https://doi.org/10.1145/1989323.1989424" target="_blank">https://doi.org/10.1145/1989323.1989424</a>
dc.identifier.isbn	9781450306614
dc.identifier.issn	07308078
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/40553
dc.description.abstract	To achieve high reliability and scalability, most large-scale data warehouse systems have adopted the cluster-based architecture. In this paper, we propose the design of a new cluster-based data warehouse system, LLama, a hybrid data management system which combines the features of row-wise and column-wise database systems. In Llama, columns are formed into correlation groups to provide the basis for the vertical partitioning of tables. Llama employs a distributed file system (DFS) to disseminate data among cluster nodes. Above the DFS, a MapReduce-based query engine is supported. We design a new join algorithm to facilitate fast join processing. We present a performance study on TPC-H dataset and compare Llama with Hive, a data warehouse infrastructure built on top of Hadoop. The experiment is conducted on EC2. The results show that Llama has an excellent load performance and its query performance is significantly better than the traditional MapReduce framework based on row-wise storage. © 2011 ACM.
dc.description.uri	http://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1145/1989323.1989424
dc.source	Scopus
dc.subject	column store
dc.subject	join
dc.subject	MapReduce
dc.type	Conference Paper
dc.contributor.department	COMPUTER SCIENCE
dc.description.doi	10.1145/1989323.1989424
dc.description.sourcetitle	Proceedings of the ACM SIGMOD International Conference on Management of Data
dc.description.page	961-972
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM