Please use this identifier to cite or link to this item:
Title: Random sampling and generation over data streams and graphs
Keywords: sampling,generation,Markov Chain Monte Carlo,large dataset,data stream,graph
Issue Date: 25-Jan-2013
Source: LU XUESONG (2013-01-25). Random sampling and generation over data streams and graphs. ScholarBank@NUS Repository.
Abstract: Sampling or random sampling is a ubiquitous tool to circumvent scalability issues arising from the challenge of processing large datasets. The ability to generate representative samples of smaller size is useful not only to circumvent scalability issues but also, per se, for statistical analysis, data processing and other data mining tasks. Generation is a related problem that aims to randomly generate elements among all the candidate ones with some particular characteristics. Classic examples are the various kinds of graph models. In this thesis, we focus on random sampling and generation problems over data streams and large graphs. We first conceptually indicate the relation between random sampling and generation. We also introduce the conception of three relevant problems, namely, construction, enumeration and counting. We reveal the malpractice of these three methods in finding representative samples of large datasets. We propose problems encountered in the processing of data streams and large graphs, and devise novel and practical algorithms to solve these problems.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
LuXuesong.pdf1.84 MBAdobe PDF



Page view(s)

checked on Dec 11, 2017


checked on Dec 11, 2017

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.