Please use this identifier to cite or link to this item: https://doi.org/10.1109/ICDM.2006.132
Title: Rapid identification of column heterogeneity
Authors: Dai, B.T.
Koudas, N.
Ooi, B.C. 
Srivastava, D.
Venkatasubramanian, S.
Issue Date: 2007
Citation: Dai, B.T.,Koudas, N.,Ooi, B.C.,Srivastava, D.,Venkatasubramanian, S. (2007). Rapid identification of column heterogeneity. Proceedings - IEEE International Conference on Data Mining, ICDM : 159-170. ScholarBank@NUS Repository. https://doi.org/10.1109/ICDM.2006.132
Abstract: Data quality is a serious concern in every data management application, and a variety of quality measures have been proposed, e.g., accuracy, freshness and completeness, to capture common sources of data quality degradation. We identify and focus attention on a novel measure, column heterogeneity, that seeks to quantify the data quality problems that can arise when merging data from different sources. We identify desiderata that a column heterogeneity measure should intuitively satisfy, and describe our technique to quantify database column heterogeneity based on using a novel combination of cluster entropy and soft clustering. Finally, we present detailed experimental results, using diverse data sets of different types, to demonstrate that our approach provides a robust mechanism for identifying and quantifying database column heterogeneity. © 2006 IEEE.
Source Title: Proceedings - IEEE International Conference on Data Mining, ICDM
URI: http://scholarbank.nus.edu.sg/handle/10635/41397
ISBN: 0769527019
ISSN: 15504786
DOI: 10.1109/ICDM.2006.132
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.