Please use this identifier to cite or link to this item:
Title: Mining mutation chains in biological sequences
Authors: Sheng, C.
Hsu, W. 
Lee, M.L. 
Tong, J.C.
Ng, S.-K.
Issue Date: 2010
Citation: Sheng, C., Hsu, W., Lee, M.L., Tong, J.C., Ng, S.-K. (2010). Mining mutation chains in biological sequences. Proceedings - International Conference on Data Engineering : 473-484. ScholarBank@NUS Repository.
Abstract: The increasing infectious disease outbreaks has led to a need for new research to better understand the disease's origins, epidemiological features and pathogenicity caused by fast-mutating, fast-spreading viruses. Traditional sequence analysis methods do not take into account the spatio-temporal dynamics of rapidly evolving and spreading viral species. They are also focused on identifying single-point mutations. In this paper, we propose a novel approach that incorporates space-time relationships for studying changes in protein sequences from fast mutating viruses. We aim to detect both single-point mutations as well as k-mutations in the viral sequences. We define the problem of mutation chain pattern mining and design algorithms to discover valid mutation chains. Compact data structures to facilitate the mining process as well as pruning strategies to increase the scalability of the algorithms are devised. Experiments on both synthetic datasets and real world influenza A virus dataset show that our algorithms are scalable and effective in discovering mutations that occur geographically over time. © 2010 IEEE.
Source Title: Proceedings - International Conference on Data Engineering
ISBN: 9781424454440
ISSN: 10844627
DOI: 10.1109/ICDE.2010.5447869
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.