Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/78084
Title: | DeepDetect: An eXtensible system for detecting attribute outliers & duplicates in XML | Authors: | Lau, Q.P. Hsu, W. Koh, J.L.Y. Lee, M.L. |
Keywords: | Attribute outlier detection Data cleaning Data quality Duplicate detection XML |
Issue Date: | 2009 | Citation: | Lau, Q.P.,Hsu, W.,Koh, J.L.Y.,Lee, M.L. (2009). DeepDetect: An eXtensible system for detecting attribute outliers & duplicates in XML. Data Quality and High-Dimensional Data Analysis - Proceedings of the DASFAA 2008 Workshops : 6-20. ScholarBank@NUS Repository. | Abstract: | XML, the eXtensible Markup Language, is fast evolving into the new standard for data representation and exchange on the WWW. This has resulted in a growing number of data cleaning techniques to locate "dirty" data (artifacts). In this paper, we present DeepDetect - an extensible system that detects attribute outliers and duplicates in XML documents. Attribute outlier detection finds objects that contain deviating values with respect to a relevant group of objects. This entails utilizing the correlation among element values in a given XML document. Duplicate detection in XML requires the identification of sub-trees that correspond to real world objects. Our system architecture enables sharing of common operations that prepare XML data for the various artifact detection techniques. DeepDetect also provides an intuitive visual interface for the user to specify various parameters for preprocessing and detection, as well as to view results. | Source Title: | Data Quality and High-Dimensional Data Analysis - Proceedings of the DASFAA 2008 Workshops | URI: | http://scholarbank.nus.edu.sg/handle/10635/78084 | ISBN: | 9814273481 |
Appears in Collections: | Staff Publications |
Show full item record
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.