Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/135440
DC FieldValue
dc.titlePROFILING ENTITIES OVER TIME WITH UNRELIABLE SOURCES
dc.contributor.authorLI FURONG
dc.date.accessioned2017-04-30T18:00:17Z
dc.date.available2017-04-30T18:00:17Z
dc.date.issued2016-12-19
dc.identifier.citationLI FURONG (2016-12-19). PROFILING ENTITIES OVER TIME WITH UNRELIABLE SOURCES. ScholarBank@NUS Repository.
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/135440
dc.description.abstractNowadays, an entity's information is, more often than not, published by multiple sources. Each source may describe the same entity with distinct representations and provide incomplete information; the information may contain errors or be valid for different time periods. A complete picture of a real-world entity is often unavailable without integrating the data from different sources. In this thesis, we study how to construct an integrated profile for an entity by harvesting information from various sources. We first consider the case where an entity may change its attribute values over time. To understand the evolution patterns of entities, we develop a transition model which captures the probability that an entity changes to a particular attribute value after some time period. Then through a source-aware temporal matching algorithm, we showcase how the transition model can be considered jointly with the freshness of data sources to link temporal records to entities at the right time period. In this way, an increasingly complete and up-to-date entity profile can be derived as more and more temporal records are aggregated from different sources. Next, we consider the case where the sources may provide erroneous values. We present a framework that collates data records from multiple sources, and corrects any erroneous values contained in the records in order to construct accurate profiles for real-world entities. The proposed framework interleaves record matching with error correction, taking into consideration the varying source reliabilities on different attributes. It first uses a confidence based matching to discriminate records in terms of ambiguity and source reliability. Then it performs adaptive matching to reduce the impact of erroneous values on the matching decisions. As future work, we jointly consider the above two cases, that is, an entity may change its attribute values over time and a source may publish erroneous values, and thus provide an integrated solution. Furthermore, we develop a data fusion model that is able to find multiple true values for an entity.
dc.language.isoen
dc.subjectdata integration, entity profiling, record linkage, truth discovery
dc.typeThesis
dc.contributor.departmentCOMPUTER SCIENCE
dc.contributor.supervisorLEE MONG LI, JANICE
dc.description.degreePh.D
dc.description.degreeconferredDOCTOR OF PHILOSOPHY
dc.identifier.isiutNOT_IN_WOS
Appears in Collections:Ph.D Theses (Open)

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
thesis_furong.pdf3.83 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.