Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/39713
DC Field | Value | |
---|---|---|
dc.title | Constructing compressed suffix arrays with large alphabets | |
dc.contributor.author | Hon, W.-K. | |
dc.contributor.author | Lam, T.-W. | |
dc.contributor.author | Sadakane, K. | |
dc.contributor.author | Sung, W.-K. | |
dc.date.accessioned | 2013-07-04T07:47:52Z | |
dc.date.available | 2013-07-04T07:47:52Z | |
dc.date.issued | 2003 | |
dc.identifier.citation | Hon, W.-K.,Lam, T.-W.,Sadakane, K.,Sung, W.-K. (2003). Constructing compressed suffix arrays with large alphabets. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2906 : 240-249. ScholarBank@NUS Repository. | |
dc.identifier.issn | 03029743 | |
dc.identifier.uri | http://scholarbank.nus.edu.sg/handle/10635/39713 | |
dc.description.abstract | Recent research in compressing suffix arrays has resulted in two breakthrough indexing data structures, namely, compressed suffix arrays (CSA) [7] and FM-index [5]. Either of them makes it feasible to store a full-text index in the main memory even for a piece of text data with a few billion characters (such as human DNA). However, constructing such indexing data structures with limited working memory (i.e., without constructing suffix arrays) is not a trivial task. This paper addresses this problem. Currently, only CSA admits a space-efficient construction algorithm [15]. For a text T of length n over an alphabet Ó, this algorithm requires O(|∑|n log n) time and (2Ho + 1 + ε)n bits of working space, where Ho is the 0-th order empirical entropy of T and ε is any non-zero constant. This algorithm is good enough when the alphabet size |∑| is small. It is not practical for text data containing protein, Chinese or Japanese, where the alphabet may include up to a few thousand characters. The main contribution of this paper is a new algorithm which can construct CSA in O(n log n) time using (Ho + 2 + ε)n bits of working space. Note that the running time of our algorithm is independent of the alphabet size and the space requirement is smaller as it is likely that Ho > 1. This paper also makes contribution to the space-efficient construction of FM-index. We show that FM-index can indeed be constructed from CSA directly in O(n) time. © Springer-Verlag Berlin Heidelberg 2003. | |
dc.source | Scopus | |
dc.type | Article | |
dc.contributor.department | COMPUTER SCIENCE | |
dc.description.sourcetitle | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | |
dc.description.volume | 2906 | |
dc.description.page | 240-249 | |
dc.identifier.isiut | NOT_IN_WOS | |
Appears in Collections: | Staff Publications |
Show simple item record
Files in This Item:
There are no files associated with this item.
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.