Please use this identifier to cite or link to this item: https://doi.org/10.1089/cmb.2006.0008
Title: Generalized correlation functions and their applications in selection of optimal multiple spaced seeds for homology search
Authors: Kong, Y. 
Keywords: Filtration technique
Generating function
Goulden-Jackson cluster method
Homology search
Sequence alignment
Issue Date: Mar-2007
Source: Kong, Y. (2007-03). Generalized correlation functions and their applications in selection of optimal multiple spaced seeds for homology search. Journal of Computational Biology 14 (2) : 238-254. ScholarBank@NUS Repository. https://doi.org/10.1089/cmb.2006.0008
Abstract: The Goulden-Jackson cluster method is a powerful method to calculate the probability of occurrences of a pattern or set of patterns in a sequence. If the patterns contain wildcard characters, however, the size of the connector matrix grows exponentially with the number of wildcards. Here we show that average correlation c̄(z) is a good predicator of hitting probability q n, and the generalized correlation function ĉ(z) can be used to approximate c̄(z) efficiently. We apply the method to the problem of optimal multiple spaced seed selection for homology search. We reexamine the concept of optimal sensitivity of spaced seeds and show that it is better to select optimal seeds based on some average properties, such as c̄(1), which is the expectation of the first hitting length. Higher order approximations can also be constructed easily. Tests on arbitrary large genomic data with multiple seeds show that the optimal multiple seeds selected by the methods are indeed more sensitive. The methods provide a theoretical background on which various empirical observations can be unified and further heuristic search methods can be developed. © Mary Ann Liebert, Inc.
Source Title: Journal of Computational Biology
URI: http://scholarbank.nus.edu.sg/handle/10635/103326
ISSN: 10665277
DOI: 10.1089/cmb.2006.0008
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

6
checked on Feb 15, 2018

WEB OF SCIENCETM
Citations

4
checked on Feb 6, 2018

Page view(s)

14
checked on Feb 20, 2018

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.