Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/31584
Title: On column heterogeneity
Authors: DAI BINGTIAN
Keywords: Column Heterogeneity, Data Integration, Data Quality, Probabilistic Tagging, Semantic Inference, Column Matching
Issue Date: 18-Aug-2011
Citation: DAI BINGTIAN (2011-08-18). On column heterogeneity. ScholarBank@NUS Repository.
Abstract: Database columns are the very basic units in databases, and they determine how databases are designed, as well as how database queries are processed. In this thesis, we study one particular property about database columns--the column heterogeneity. Our study consists of three parts. The first part is on the intra-column heterogeneity, i.e., the heterogeneity within a database column. The measure of column heterogeneity is defined based on the syntactic types of the values in a column. The subsequent two parts then discuss the inter-column heterogeneity, i.e., the heterogeneity when multiple database columns are taken into consideration. We propose validating schema matching, which is to prevent database columns from becoming more heterogeneous. The schema matching validator includes a measure of integratability, a procedure of extracting sub-string matches, and an invalidation certificate as evidence of two columns being not integratable. The last part of this thesis focuses on the problem that arises from the data management of emerging community databases. An out-of-the-box approach that separates users' actions from database operations is proposed. This approach evaluates the heterogeneity across different database columns, and thus makes semantic influences among those columns. With these, this thesis shows how to deal with column heterogeneity in databases.
URI: http://scholarbank.nus.edu.sg/handle/10635/31584
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
DaiBT.pdf1.52 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.