Please use this identifier to cite or link to this item:
https://doi.org/10.1109/ICDM.2006.132
Title: | Rapid identification of column heterogeneity | Authors: | Dai, B.T. Koudas, N. Ooi, B.C. Srivastava, D. Venkatasubramanian, S. |
Issue Date: | 2007 | Citation: | Dai, B.T.,Koudas, N.,Ooi, B.C.,Srivastava, D.,Venkatasubramanian, S. (2007). Rapid identification of column heterogeneity. Proceedings - IEEE International Conference on Data Mining, ICDM : 159-170. ScholarBank@NUS Repository. https://doi.org/10.1109/ICDM.2006.132 | Abstract: | Data quality is a serious concern in every data management application, and a variety of quality measures have been proposed, e.g., accuracy, freshness and completeness, to capture common sources of data quality degradation. We identify and focus attention on a novel measure, column heterogeneity, that seeks to quantify the data quality problems that can arise when merging data from different sources. We identify desiderata that a column heterogeneity measure should intuitively satisfy, and describe our technique to quantify database column heterogeneity based on using a novel combination of cluster entropy and soft clustering. Finally, we present detailed experimental results, using diverse data sets of different types, to demonstrate that our approach provides a robust mechanism for identifying and quantifying database column heterogeneity. © 2006 IEEE. | Source Title: | Proceedings - IEEE International Conference on Data Mining, ICDM | URI: | http://scholarbank.nus.edu.sg/handle/10635/41397 | ISBN: | 0769527019 | ISSN: | 15504786 | DOI: | 10.1109/ICDM.2006.132 |
Appears in Collections: | Staff Publications |
Show full item record
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.