Please use this identifier to cite or link to this item:
|Title:||Spectrum-based de novo repeat detection in genomic sequences|
|Source:||Do, H.H., Choi, K.P., Preparata, F.P., Sung, W.K., Zhang, L. (2008). Spectrum-based de novo repeat detection in genomic sequences. Journal of Computational Biology 15 (5) : 469-487. ScholarBank@NUS Repository. https://doi.org/10.1089/cmb.2008.0013|
|Abstract:||A novel approach to the detection of genomic repeats is presented in this paper. The technique, dubbed SAGRI (Spectrum Assisted Genomic Repeat Identifier), is based on the spectrum (set of sequence k-mers, for some k) of the genomic sequence. Specifically, the genome is scanned twice. The first scan (FindHit) detects candidate pairs of repeat-segments, by effectively reconstructing portions of the Euler path of the (k-1)-mer graph of the genome only in correspondence with likely repeat sites. This process produces candidate repeat pairs, for which the location of the leftmost term is unknown. Candidate pairs are then subjected to validation in a second scan, in which the genome is labelled for hits in the (much smaller) spectrum of the repeat candidates: high hit density is taken as evidence of the location of the first segment of a repeat, and the pair of segments is then certified by pairwise alignment. The design parameters of the technique are selected on the basis of a careful probabilistic analysis (based on random sequences). SAGRI is compared with three leading repeat-finding tools on both synthetic and natural DNA sequences, and found to be uniformly superior in versatility (ability to detect repeats of different lengths) and accuracy (the central goal of repeat finding), while being quite competitive in speed. An executable program can be downloaded at http://sagri.comp.nus.edu.sg. © Mary Ann Liebert, Inc. 2008.|
|Source Title:||Journal of Computational Biology|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Feb 25, 2018
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.