Please use this identifier to cite or link to this item: https://doi.org/10.1089/cmb.2009.0198
DC FieldValue
dc.titleAlignment-free sequence comparison (I): Statistics and power
dc.contributor.authorReinert, G.
dc.contributor.authorChew, D.
dc.contributor.authorSun, F.
dc.contributor.authorWaterman, M.S.
dc.date.accessioned2014-10-28T05:09:54Z
dc.date.available2014-10-28T05:09:54Z
dc.date.issued2009-12-01
dc.identifier.citationReinert, G., Chew, D., Sun, F., Waterman, M.S. (2009-12-01). Alignment-free sequence comparison (I): Statistics and power. Journal of Computational Biology 16 (12) : 1615-1634. ScholarBank@NUS Repository. https://doi.org/10.1089/cmb.2009.0198
dc.identifier.issn10665277
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/104987
dc.description.abstractLarge-scale comparison of the similarities between two biological sequences is a major issue in computational biology; a fast method, the D2 statistic, relies on the comparison of the k-tuple content for both sequences. Although it has been known for some years that the D2 statistic is not suitable for this task, as it tends to be dominated by single-sequence noise, to date no suitable adjustments have been proposed. In this article, we suggest two new variants of the D2 word count statistic, which we call D2 S and D2 *. For D 2 S, which is a self-standardized statistic, we show that the statistic is asymptotically normally distributed, when sequence lengths tend to infinity, and not dominated by the noise in the individual sequences. The second statistic, D2 *, outperforms D 2 S in terms of power for detecting the relatedness between the two sequences in our examples; but although it is straightforward to simulate from the asymptotic distribution of D2 *, we cannot provide a closed form for power calculations. © 2009, Mary Ann Liebert, Inc.
dc.description.urihttp://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1089/cmb.2009.0198
dc.sourceScopus
dc.subjectAlignment-free
dc.subjectNormal approximation
dc.subjectNormal distribution
dc.subjectSequence alignment
dc.subjectWord count statistics
dc.typeArticle
dc.contributor.departmentSTATISTICS & APPLIED PROBABILITY
dc.description.doi10.1089/cmb.2009.0198
dc.description.sourcetitleJournal of Computational Biology
dc.description.volume16
dc.description.issue12
dc.description.page1615-1634
dc.description.codenJCOBE
dc.identifier.isiut000273709400001
Appears in Collections:Staff Publications

Show simple item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

153
checked on Nov 30, 2022

WEB OF SCIENCETM
Citations

137
checked on Nov 30, 2022

Page view(s)

383
checked on Nov 24, 2022

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.