Please use this identifier to cite or link to this item:
|Title:||Sampling from databases using B+Trees|
quality of samples
weighted random sampling
|Source:||Makawita, D.,Tan, K.-L.,Liu, H. (2002). Sampling from databases using B+Trees. Intelligent Data Analysis 6 (4) : 359-377. ScholarBank@NUS Repository.|
|Abstract:||Sampling techniques are becoming increasingly important for very large databases. However, the problem of obtaining a random sample from index structures has not received much attention. In this paper, we examine sampling techniques for B tree. As the fanout of each node varies, a random walk through the index structure does not produce a good representative sample of the data set. We propose a new technique, called B Tree based Weighted Random Sampling (BTWRS), that alters the inclusion probabilities of records accordingly to allow more records from leaves, along the paths with higher fanouts, to be extracted. We extensively evaluated our method, and the results show that there is an improvement in BTWRS over the existing schemes in terms of the quality of the samples obtained and the efficiency of the sampling process. The proposed method can be readily adopted in existing commercial systems. © 2002-IOS Press. All rights reserved.|
|Source Title:||Intelligent Data Analysis|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Feb 17, 2018
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.