Automatic discovery of attributes in relational databases

Please use this identifier to cite or link to this item: https://doi.org/10.1145/1989323.1989336

DC Field	Value
dc.title	Automatic discovery of attributes in relational databases
dc.contributor.author	Zhang, M.
dc.contributor.author	Hadjieleftheriou, M.
dc.contributor.author	Ooi, B.C.
dc.contributor.author	Procopiuc, C.M.
dc.contributor.author	Srivastava, D.
dc.date.accessioned	2013-07-04T08:41:29Z
dc.date.available	2013-07-04T08:41:29Z
dc.date.issued	2011
dc.identifier.citation	Zhang, M.,Hadjieleftheriou, M.,Ooi, B.C.,Procopiuc, C.M.,Srivastava, D. (2011). Automatic discovery of attributes in relational databases. Proceedings of the ACM SIGMOD International Conference on Management of Data : 109-120. ScholarBank@NUS Repository. <a href="https://doi.org/10.1145/1989323.1989336" target="_blank">https://doi.org/10.1145/1989323.1989336</a>
dc.identifier.isbn	9781450306614
dc.identifier.issn	07308078
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/42024
dc.description.abstract	In this work we design algorithms for clustering relational columns into attributes, i.e., for identifying strong relationships between columns based on the common properties and characteristics of the values they contain. For example, identifying whether a certain set of columns refers to telephone numbers versus social security numbers, or names of customers versus names of nations. Traditional relational database schema languages use very limited primitive data types and simple foreign key constraints to express relationships between columns. Object oriented schema languages allow the definition of custom data types; still, certain relationships between columns might be unknown at design time or they might appear only in a particular database instance. Nevertheless, these relationships are an invaluable tool for schema matching, and generally for better understanding and working with the data. Here, we introduce data oriented solutions (we do not consider solutions that assume the existence of any external knowledge) that use statistical measures to identify strong relationships between the values of a set of columns. Interpreting the database as a graph where nodes correspond to database columns and edges correspond to column relationships, we decompose the graph into connected components and cluster sets of columns into attributes. To test the quality of our solution, we also provide a comprehensive experimental evaluation using real and synthetic datasets. © 2011 ACM.
dc.description.uri	http://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1145/1989323.1989336
dc.source	Scopus
dc.subject	attribute discovery
dc.subject	schema matching
dc.type	Conference Paper
dc.contributor.department	COMPUTER SCIENCE
dc.description.doi	10.1145/1989323.1989336
dc.description.sourcetitle	Proceedings of the ACM SIGMOD International Conference on Management of Data
dc.description.page	109-120
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM