Please use this identifier to cite or link to this item: https://doi.org/10.1089/cmb.2008.0013
Title: Spectrum-based de novo repeat detection in genomic sequences
Authors: Do, H.H.
Choi, K.P. 
Preparata, F.P.
Sung, W.K. 
Zhang, L. 
Keywords: Genomic repeat
Hamming metric
Levenshtein metric
Repeat finding
Sequence spectrum
Issue Date: 2008
Citation: Do, H.H., Choi, K.P., Preparata, F.P., Sung, W.K., Zhang, L. (2008). Spectrum-based de novo repeat detection in genomic sequences. Journal of Computational Biology 15 (5) : 469-487. ScholarBank@NUS Repository. https://doi.org/10.1089/cmb.2008.0013
Abstract: A novel approach to the detection of genomic repeats is presented in this paper. The technique, dubbed SAGRI (Spectrum Assisted Genomic Repeat Identifier), is based on the spectrum (set of sequence k-mers, for some k) of the genomic sequence. Specifically, the genome is scanned twice. The first scan (FindHit) detects candidate pairs of repeat-segments, by effectively reconstructing portions of the Euler path of the (k-1)-mer graph of the genome only in correspondence with likely repeat sites. This process produces candidate repeat pairs, for which the location of the leftmost term is unknown. Candidate pairs are then subjected to validation in a second scan, in which the genome is labelled for hits in the (much smaller) spectrum of the repeat candidates: high hit density is taken as evidence of the location of the first segment of a repeat, and the pair of segments is then certified by pairwise alignment. The design parameters of the technique are selected on the basis of a careful probabilistic analysis (based on random sequences). SAGRI is compared with three leading repeat-finding tools on both synthetic and natural DNA sequences, and found to be uniformly superior in versatility (ability to detect repeats of different lengths) and accuracy (the central goal of repeat finding), while being quite competitive in speed. An executable program can be downloaded at http://sagri.comp.nus.edu.sg. © Mary Ann Liebert, Inc. 2008.
Source Title: Journal of Computational Biology
URI: http://scholarbank.nus.edu.sg/handle/10635/43115
ISSN: 10665277
DOI: 10.1089/cmb.2008.0013
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.