Please use this identifier to cite or link to this item: http://scholarbank.nus.edu.sg/handle/10635/135440
Title: PROFILING ENTITIES OVER TIME WITH UNRELIABLE SOURCES
Authors: LI FURONG
Keywords: data integration, entity profiling, record linkage, truth discovery
Issue Date: 19-Dec-2016
Source: LI FURONG (2016-12-19). PROFILING ENTITIES OVER TIME WITH UNRELIABLE SOURCES. ScholarBank@NUS Repository.
Abstract: Nowadays, an entity's information is, more often than not, published by multiple sources. Each source may describe the same entity with distinct representations and provide incomplete information; the information may contain errors or be valid for different time periods. A complete picture of a real-world entity is often unavailable without integrating the data from different sources. In this thesis, we study how to construct an integrated profile for an entity by harvesting information from various sources. We first consider the case where an entity may change its attribute values over time. To understand the evolution patterns of entities, we develop a transition model which captures the probability that an entity changes to a particular attribute value after some time period. Then through a source-aware temporal matching algorithm, we showcase how the transition model can be considered jointly with the freshness of data sources to link temporal records to entities at the right time period. In this way, an increasingly complete and up-to-date entity profile can be derived as more and more temporal records are aggregated from different sources. Next, we consider the case where the sources may provide erroneous values. We present a framework that collates data records from multiple sources, and corrects any erroneous values contained in the records in order to construct accurate profiles for real-world entities. The proposed framework interleaves record matching with error correction, taking into consideration the varying source reliabilities on different attributes. It first uses a confidence based matching to discriminate records in terms of ambiguity and source reliability. Then it performs adaptive matching to reduce the impact of erroneous values on the matching decisions. As future work, we jointly consider the above two cases, that is, an entity may change its attribute values over time and a source may publish erroneous values, and thus provide an integrated solution. Furthermore, we develop a data fusion model that is able to find multiple true values for an entity.
URI: http://scholarbank.nus.edu.sg/handle/10635/135440
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
thesis_furong.pdf3.83 MBAdobe PDF

OPEN

NoneView/Download

Page view(s)

27
checked on Jan 13, 2018

Download(s)

26
checked on Jan 13, 2018

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.