Please use this identifier to cite or link to this item:
Title: Efficient and effective data cleansing for large database
Authors: LI ZHAO
Keywords: Data Cleansing, Similarity, Detection Method, Anchor Record, Filtering Scheme, Dynamic Similarity
Issue Date: 9-Jun-2004
Citation: LI ZHAO (2004-06-09). Efficient and effective data cleansing for large database. ScholarBank@NUS Repository.
Abstract: Data cleansing recently receives much attention in data warehousing, database integration, and data mining etc. It consists of two main components, detection method and comparison method. In this thesis, we study several problems in data cleansing, propose new detection methods, and extend existing comparison methods. We first propose two new efficient data cleansing methods, RAR1 and RAR2, that take into account dependency between detection method and comparison method. Since comparison methods are generally very costly, we then introduce a fast filtering scheme that further improves the performance. Finally, we present a dynamic similarity method to address fields with NULL value well by extending existing comparison methods. Performance study shows that our new approaches improves the result significantly in both efficiency and accuracy.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
LIZHAO-PhD_Thesis.pdf570.11 kBAdobe PDF



Page view(s)

checked on Dec 9, 2018


checked on Dec 9, 2018

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.