Please use this identifier to cite or link to this item:
Title: Privacy protection via anonymization for publishing multi-type data
Keywords: privacy, data publication, anonymization
Issue Date: 14-Jun-2012
Citation: XUE MINGQIANG (2012-06-14). Privacy protection via anonymization for publishing multi-type data. ScholarBank@NUS Repository.
Abstract: Organizations often possess data that they wish to make public for the common good. Yet such published data often contains sensitive personal information, posing serious privacy threat to individuals. Anonymization is a process of removing identifiable information from the data, and yet to preserve as much data utility as possible for accurate data analysis. Due to the importance of privacy, in recent years, researchers were attracted to design new privacy models and anonymization algorithms for privacy preserving data publication. Despite of their efforts, there are still many outstanding problems remain to be solved. We aim to contribute to the state-of-the-art data anonymization schemes with an emphasis on different data models for data publication. Specifically, we study and propose new data anonymization schemes for three mostly investigated data types by the literature, namely set-valued data, social graph data, and relational data. These three types of data are commonly encountered in our daily life, thus the privacy for their publication is of crucial importance. Examples of the three types of data are grocery transaction records, relationship data in online social networks, and census data by the government, respectively. We have adapted two common approaches to data anonymization, i.e. perturbation and generalization. For set-valued data publication, we propose a nonreciporical anonymization scheme that yields higher utility than existing approaches based on reciporical coding. An important reason why we can achieve better utility is that we generate a utility-efficient order for the dataset using techniques such as Gray sort, TSP reordering and dynamic partitioning, so that similar records are grouped during anonymization. We also propose a superior model for data publishing which allows more utility to be preserved than other approaches such as entry suppression. For social graph publication, we study the effectiveness of using random edge perturbation as privacy protection scheme. Previous research rejects using random edge perturbation for preventing the structural attack of social graph for the reason that random edge perturbation severely destroys the graph utilities. In contrary, we show that, by exploiting the statistical properties of random edge perturbation, it is possible to accurately recover important graph utilities such as density, transitivity, degree distribution and modularity from the perturbed graph using estimation algorithms. Then we show that based on the same principle, the attackers can launch a more sophisticated interval-walk attack which yields higher probability of success than the conventional walk-based attack. We study the conditions for preventing interval-walk attack and more general structural attack using random perturbation. For relational data publication, we propose a novel pattern preserving anonymization scheme based on perturbation. Using our scheme, the owner can define a set of Properties of Interest (PoIs) which he wishes to preserve for the original data. These PoIs are described as linear relationships among the data points. During ano- nymization, our scheme ensures the predefined patterns to be strictly preserved while making the anonymized data sufficiently randomized. Traditional generalization and perturbation based approaches either completely blind or obfuscate the patterns. The resulted data is ideal for data mining tasks such as clustering, or ranking which re- quires the preservation of relative distances. Extensive experimental results based on both synthetic and real data are presented to verify the effectiveness of our solutions.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
thesis.pdf1.96 MBAdobe PDF



Page view(s)

checked on Nov 3, 2018


checked on Nov 3, 2018

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.