A new approach for similarity queries of biological sequences in databases
Ng, H.K. ; Ning, K. ; Leong, H.W.
Ng, H.K.
Ning, K.
Citations
Altmetric:
Alternative Title
Abstract
As biological databases grow larger, effective query of the biological sequences in these databases has become an increasingly important issue for researchers. There are currently not many systems for fast access of very large biological sequences. In this paper, we propose a new approach for biological sequences similarity querying in databases. The general idea is to first transform the biological sequences into vectors and then onto 2-d points in planes; then use a spatial index to index these points with self-organizing maps (SOM), and perform a single efficient similarity query (with multiple simultaneous input sequences) using a fast algorithm, the multi-point range query (MPRQ) algorithm. This approach works well because we could perform multiple sequences similarity queries and return the results with just one MPRQ query, with tremendous savings in query time. We applied our method onto DNA and protein sequences in database, and results show that our algorithm is efficient in time, and the accuracies are satisfactory. © Springer-Verlag Berlin Heidelberg 2007.
Keywords
Source Title
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publisher
Series/Report No.
Collections
Rights
Date
2007
DOI
Type
Conference Paper