Please use this identifier to cite or link to this item:
Title: Alignment-free sequence comparison (I): Statistics and power
Authors: Reinert, G.
Chew, D. 
Sun, F.
Waterman, M.S.
Keywords: Alignment-free
Normal approximation
Normal distribution
Sequence alignment
Word count statistics
Issue Date: 1-Dec-2009
Citation: Reinert, G., Chew, D., Sun, F., Waterman, M.S. (2009-12-01). Alignment-free sequence comparison (I): Statistics and power. Journal of Computational Biology 16 (12) : 1615-1634. ScholarBank@NUS Repository.
Abstract: Large-scale comparison of the similarities between two biological sequences is a major issue in computational biology; a fast method, the D2 statistic, relies on the comparison of the k-tuple content for both sequences. Although it has been known for some years that the D2 statistic is not suitable for this task, as it tends to be dominated by single-sequence noise, to date no suitable adjustments have been proposed. In this article, we suggest two new variants of the D2 word count statistic, which we call D2 S and D2 *. For D 2 S, which is a self-standardized statistic, we show that the statistic is asymptotically normally distributed, when sequence lengths tend to infinity, and not dominated by the noise in the individual sequences. The second statistic, D2 *, outperforms D 2 S in terms of power for detecting the relatedness between the two sequences in our examples; but although it is straightforward to simulate from the asymptotic distribution of D2 *, we cannot provide a closed form for power calculations. © 2009, Mary Ann Liebert, Inc.
Source Title: Journal of Computational Biology
ISSN: 10665277
DOI: 10.1089/cmb.2009.0198
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.


checked on Sep 27, 2022


checked on Sep 27, 2022

Page view(s)

checked on Sep 22, 2022

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.