Finding representative set from massive data

Please use this identifier to cite or link to this item: https://doi.org/10.1109/ICDM.2005.69

DC Field	Value
dc.title	Finding representative set from massive data
dc.contributor.author	Pan, F.
dc.contributor.author	Wang, W.
dc.contributor.author	Tung, A.K.H.
dc.contributor.author	Yang, J.
dc.date.accessioned	2013-07-04T08:11:03Z
dc.date.available	2013-07-04T08:11:03Z
dc.date.issued	2005
dc.identifier.citation	Pan, F.,Wang, W.,Tung, A.K.H.,Yang, J. (2005). Finding representative set from massive data. Proceedings - IEEE International Conference on Data Mining, ICDM : 338-345. ScholarBank@NUS Repository. <a href="https://doi.org/10.1109/ICDM.2005.69" target="_blank">https://doi.org/10.1109/ICDM.2005.69</a>
dc.identifier.isbn	0769522785
dc.identifier.issn	15504786
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/40731
dc.description.abstract	In the information age, data is pervasive. In some applications, data explosion is a significant phenomenon. The massive data volume poses challenges to both human users and computers. In this project, we propose a new model for identifying representative set from a large database. A representative set is a special subset of the original dataset, which has three main characteristics: It is significantly smaller in size compared to the original dataset. It captures the most information from the original dataset compared to other subsets of the same size. It has low redundancy among the representatives it contains. We use informationtheoretic measures such as mutual information and relative entropy to measure the representativeness of the representative set. We first design a greedy algorithm and then present a heuristic algorithm that delivers much better performance. We run experiments on two real datasets and evaluate the effectiveness of our representative set in terms of coverage and accuracy. The experiments show that our representative set attains expected characteristics and captures information more efficiently. © 2005 IEEE.
dc.description.uri	http://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1109/ICDM.2005.69
dc.source	Scopus
dc.type	Conference Paper
dc.contributor.department	COMPUTER SCIENCE
dc.description.doi	10.1109/ICDM.2005.69
dc.description.sourcetitle	Proceedings - IEEE International Conference on Data Mining, ICDM
dc.description.page	338-345
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM