Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/39363
Title: Using interval association rules to identify dubious data values
Authors: Lu, R.
Lee, M.L. 
Hsu, W. 
Issue Date: 2004
Citation: Lu, R.,Lee, M.L.,Hsu, W. (2004). Using interval association rules to identify dubious data values. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 3129 : 528-538. ScholarBank@NUS Repository.
Abstract: A hard-to-catch erroneous data is one whose value looks perfectly legitimate. Yet, if we examine this value in conjunction with other attribute values, the value appear questionable. Detecting such dubious values is a major problem in data cleaning. This paper presents a framework to automatically detect dubious data values in the datasets. Data is first pre-processed by data smoothing and mapping. Next, interval association rules are generated which involved data partitioning via clustering before the rules are generated using an Apriori algorithm. Finally, these rules are used to identify data values that fall outside the expected intervals. Experiment results show that the proposed framework is able to accurately and efficiently dubious values in large datasets. © Springer-Verlag 2004.
Source Title: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
URI: http://scholarbank.nus.edu.sg/handle/10635/39363
ISSN: 03029743
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.