Please use this identifier to cite or link to this item: http://scholarbank.nus.edu.sg/handle/10635/13427
Title: Correlation-based methods for data cleaning, with application to biological databases
Authors: KOH LIE YONG
Keywords: data cleaning, correlation mining, biological data, data artifacts, duplicate detection, outlier detection
Issue Date: 25-Sep-2007
Source: KOH LIE YONG (2007-09-25). Correlation-based methods for data cleaning, with application to biological databases. ScholarBank@NUS Repository.
Abstract: Data cleaning aims at improving data quality through detecting and eliminating data artifacts that hamper the efficacy of analysis or data mining. Despite the importance, data cleaning remains neglected in certain knowledge-driven domains such as Bioinformatics. An in-depth study of real-world biological databases indicates that the biological data quality problem is multi-factorial and requires a number of different data cleaning approaches. Current data cleaning approaches that derive observations of data artifacts from the attribute values are inadequate. This thesis exploits the correlations patterns between attributes to provide additional information of the relationships embedded within a data set for data cleaning. We propose three novel correlation-based data cleaning methods to detect outliers and duplicates, and apply them to biological databases as proof-of-concepts. Experimental results show the effectiveness of these correlation-based data cleaning methods in detecting data artifacts that existing approaches fall short of addressing.
URI: http://scholarbank.nus.edu.sg/handle/10635/13427
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
KOHJLY.pdf1.92 MBAdobe PDF

OPEN

NoneView/Download

Page view(s)

266
checked on Dec 11, 2017

Download(s)

260
checked on Dec 11, 2017

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.