Please use this identifier to cite or link to this item: http://scholarbank.nus.edu.sg/handle/10635/78084
Title: DeepDetect: An eXtensible system for detecting attribute outliers & duplicates in XML
Authors: Lau, Q.P. 
Hsu, W. 
Koh, J.L.Y.
Lee, M.L. 
Keywords: Attribute outlier detection
Data cleaning
Data quality
Duplicate detection
XML
Issue Date: 2009
Source: Lau, Q.P.,Hsu, W.,Koh, J.L.Y.,Lee, M.L. (2009). DeepDetect: An eXtensible system for detecting attribute outliers & duplicates in XML. Data Quality and High-Dimensional Data Analysis - Proceedings of the DASFAA 2008 Workshops : 6-20. ScholarBank@NUS Repository.
Abstract: XML, the eXtensible Markup Language, is fast evolving into the new standard for data representation and exchange on the WWW. This has resulted in a growing number of data cleaning techniques to locate "dirty" data (artifacts). In this paper, we present DeepDetect - an extensible system that detects attribute outliers and duplicates in XML documents. Attribute outlier detection finds objects that contain deviating values with respect to a relevant group of objects. This entails utilizing the correlation among element values in a given XML document. Duplicate detection in XML requires the identification of sub-trees that correspond to real world objects. Our system architecture enables sharing of common operations that prepare XML data for the various artifact detection techniques. DeepDetect also provides an intuitive visual interface for the user to specify various parameters for preprocessing and detection, as well as to view results.
Source Title: Data Quality and High-Dimensional Data Analysis - Proceedings of the DASFAA 2008 Workshops
URI: http://scholarbank.nus.edu.sg/handle/10635/78084
ISBN: 9814273481
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Page view(s)

37
checked on Feb 16, 2018

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.