Please use this identifier to cite or link to this item: https://doi.org/10.1198/016214505000001401
Title: Distribution of runs and longest runs: A new generating function approach
Authors: Kong, Y. 
Keywords: Biological sequence analysis
Combinatorial analysis
Distribution-free statistical test
Lattice model
Partition function
Randomness test
Issue Date: Sep-2006
Citation: Kong, Y. (2006-09). Distribution of runs and longest runs: A new generating function approach. Journal of the American Statistical Association 101 (475) : 1253-1263. ScholarBank@NUS Repository. https://doi.org/10.1198/016214505000001401
Abstract: Exact distributions of run statistics are traditionally obtained using combinatorial methods, which, under certain situations, become very tedious. Run distributions of multiple object systems, although appearing frequently in applications from various fields, such as computational biology, are not commonly used, due in part to the lack of easy-to-use formulas. In this article, a method for evaluating partition functions of lattice models in the field of statistical mechanics is used to develop a systematic method to study various run statistics in multiple object systems. By using particular generating functions for the specified situation under study, many new distributions can be obtained in a unified and coherent way. The method makes it possible to manipulate formulas of run statistics by using binomial identities to obtain more general, yet simpler formulas. To illustrate the applications of the general method, the distributions of the total number of runs and the longest runs are investigated. Novel and general explicit formulas are derived for the distribution and moments of the total number of runs, and simple explicit formulas are derived for the distributions of the longest runs. In addition, some classical run statistics are recovered and generalized in the same unified way. As examples of applications to biological sequence analysis, the run statistics developed using the general method are applied to several protein sequences to examine their global and local features. © 2006 American Statistical Association.
Source Title: Journal of the American Statistical Association
URI: http://scholarbank.nus.edu.sg/handle/10635/104693
ISSN: 01621459
DOI: 10.1198/016214505000001401
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.