Please use this identifier to cite or link to this item: http://scholarbank.nus.edu.sg/handle/10635/14069
Title: Efficient and effective data cleansing for large database
Authors: LI ZHAO
Keywords: Data Cleansing, Similarity, Detection Method, Anchor Record, Filtering Scheme, Dynamic Similarity
Issue Date: 9-Jun-2004
Source: LI ZHAO (2004-06-09). Efficient and effective data cleansing for large database. ScholarBank@NUS Repository.
Abstract: Data cleansing recently receives much attention in data warehousing, database integration, and data mining etc. It consists of two main components, detection method and comparison method. In this thesis, we study several problems in data cleansing, propose new detection methods, and extend existing comparison methods. We first propose two new efficient data cleansing methods, RAR1 and RAR2, that take into account dependency between detection method and comparison method. Since comparison methods are generally very costly, we then introduce a fast filtering scheme that further improves the performance. Finally, we present a dynamic similarity method to address fields with NULL value well by extending existing comparison methods. Performance study shows that our new approaches improves the result significantly in both efficiency and accuracy.
URI: http://scholarbank.nus.edu.sg/handle/10635/14069
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
LIZHAO-PhD_Thesis.pdf570.11 kBAdobe PDF

OPEN

NoneView/Download

Page view(s)

206
checked on Dec 11, 2017

Download(s)

231
checked on Dec 11, 2017

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.