Please use this identifier to cite or link to this item:
Title: Rapid identification of column heterogeneity
Authors: Dai, B.T.
Koudas, N.
Ooi, B.C. 
Srivastava, D.
Venkatasubramanian, S.
Issue Date: 2007
Source: Dai, B.T.,Koudas, N.,Ooi, B.C.,Srivastava, D.,Venkatasubramanian, S. (2007). Rapid identification of column heterogeneity. Proceedings - IEEE International Conference on Data Mining, ICDM : 159-170. ScholarBank@NUS Repository.
Abstract: Data quality is a serious concern in every data management application, and a variety of quality measures have been proposed, e.g., accuracy, freshness and completeness, to capture common sources of data quality degradation. We identify and focus attention on a novel measure, column heterogeneity, that seeks to quantify the data quality problems that can arise when merging data from different sources. We identify desiderata that a column heterogeneity measure should intuitively satisfy, and describe our technique to quantify database column heterogeneity based on using a novel combination of cluster entropy and soft clustering. Finally, we present detailed experimental results, using diverse data sets of different types, to demonstrate that our approach provides a robust mechanism for identifying and quantifying database column heterogeneity. © 2006 IEEE.
Source Title: Proceedings - IEEE International Conference on Data Mining, ICDM
ISBN: 0769527019
ISSN: 15504786
DOI: 10.1109/ICDM.2006.132
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.


checked on Dec 13, 2017

Page view(s)

checked on Dec 9, 2017

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.