Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/38750
Title: | Random sampling and generation over data streams and graphs | Authors: | LU XUESONG | Keywords: | sampling,generation,Markov Chain Monte Carlo,large dataset,data stream,graph | Issue Date: | 25-Jan-2013 | Citation: | LU XUESONG (2013-01-25). Random sampling and generation over data streams and graphs. ScholarBank@NUS Repository. | Abstract: | Sampling or random sampling is a ubiquitous tool to circumvent scalability issues arising from the challenge of processing large datasets. The ability to generate representative samples of smaller size is useful not only to circumvent scalability issues but also, per se, for statistical analysis, data processing and other data mining tasks. Generation is a related problem that aims to randomly generate elements among all the candidate ones with some particular characteristics. Classic examples are the various kinds of graph models. In this thesis, we focus on random sampling and generation problems over data streams and large graphs. We first conceptually indicate the relation between random sampling and generation. We also introduce the conception of three relevant problems, namely, construction, enumeration and counting. We reveal the malpractice of these three methods in finding representative samples of large datasets. We propose problems encountered in the processing of data streams and large graphs, and devise novel and practical algorithms to solve these problems. | URI: | http://scholarbank.nus.edu.sg/handle/10635/38750 |
Appears in Collections: | Ph.D Theses (Open) |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
LuXuesong.pdf | 1.84 MB | Adobe PDF | OPEN | None | View/Download |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.