Please use this identifier to cite or link to this item:
Title: Generalized correlation functions and their applications in selection of optimal multiple spaced seeds for homology search
Authors: Kong, Y. 
Keywords: Filtration technique
Generating function
Goulden-Jackson cluster method
Homology search
Sequence alignment
Issue Date: Mar-2007
Citation: Kong, Y. (2007-03). Generalized correlation functions and their applications in selection of optimal multiple spaced seeds for homology search. Journal of Computational Biology 14 (2) : 238-254. ScholarBank@NUS Repository.
Abstract: The Goulden-Jackson cluster method is a powerful method to calculate the probability of occurrences of a pattern or set of patterns in a sequence. If the patterns contain wildcard characters, however, the size of the connector matrix grows exponentially with the number of wildcards. Here we show that average correlation c̄(z) is a good predicator of hitting probability q n, and the generalized correlation function ĉ(z) can be used to approximate c̄(z) efficiently. We apply the method to the problem of optimal multiple spaced seed selection for homology search. We reexamine the concept of optimal sensitivity of spaced seeds and show that it is better to select optimal seeds based on some average properties, such as c̄(1), which is the expectation of the first hitting length. Higher order approximations can also be constructed easily. Tests on arbitrary large genomic data with multiple seeds show that the optimal multiple seeds selected by the methods are indeed more sensitive. The methods provide a theoretical background on which various empirical observations can be unified and further heuristic search methods can be developed. © Mary Ann Liebert, Inc.
Source Title: Journal of Computational Biology
ISSN: 10665277
DOI: 10.1089/cmb.2006.0008
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.


checked on Jan 14, 2021


checked on Jan 14, 2021

Page view(s)

checked on Jan 17, 2021

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.