Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/77914
Title: Sampling from databases using B+Trees
Authors: Makawita, D.
Tan, K.-L. 
Liu, H.
Keywords: B Tree
quality of samples
weighted random sampling
Issue Date: 2002
Citation: Makawita, D.,Tan, K.-L.,Liu, H. (2002). Sampling from databases using B+Trees. Intelligent Data Analysis 6 (4) : 359-377. ScholarBank@NUS Repository.
Abstract: Sampling techniques are becoming increasingly important for very large databases. However, the problem of obtaining a random sample from index structures has not received much attention. In this paper, we examine sampling techniques for B tree. As the fanout of each node varies, a random walk through the index structure does not produce a good representative sample of the data set. We propose a new technique, called B Tree based Weighted Random Sampling (BTWRS), that alters the inclusion probabilities of records accordingly to allow more records from leaves, along the paths with higher fanouts, to be extracted. We extensively evaluated our method, and the results show that there is an improvement in BTWRS over the existing schemes in terms of the quality of the samples obtained and the efficiency of the sampling process. The proposed method can be readily adopted in existing commercial systems. © 2002-IOS Press. All rights reserved.
Source Title: Intelligent Data Analysis
URI: http://scholarbank.nus.edu.sg/handle/10635/77914
ISSN: 1088467X
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.