Assessment of approximate string matching in a biomedical text retrieval problem | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://doi.org/10.1016/j.compbiomed.2004.06.002

Title:	Assessment of approximate string matching in a biomedical text retrieval problem
Authors:	Wang, J.F. Li, Z.R. Cai, C.Z. Chen, Y.Z.
Keywords:	Bioinformatics Biomedical Data mining Dynamic programming Herb Herbal medicine Literature Literature search Medicinal plant Medicine Medinformatics Plant Smith-Waterman algorithm Text Text matching Word Word matching
Issue Date:	Oct-2005
Citation:	Wang, J.F., Li, Z.R., Cai, C.Z., Chen, Y.Z. (2005-10). Assessment of approximate string matching in a biomedical text retrieval problem. Computers in Biology and Medicine 35 (8) : 717-724. ScholarBank@NUS Repository. https://doi.org/10.1016/j.compbiomed.2004.06.002
Abstract:	Text-based search is widely used for biomedical data mining and knowledge discovery. Character errors in literatures affect the accuracy of data mining. Methods for solving this problem are being explored. This work tests the usefulness of the Smith-Waterman algorithm with affine gap penalty as a method for biomedical literature retrieval. Names of medicinal herbs collected from herbal medicine literatures are matched with those from medicinal chemistry literatures by using this algorithm at different string identity levels (80-100%). The optimum performance is at string identity of 88%, at which the recall and precision are 96.9% and 97.3%, respectively. Our study suggests that the Smith-Waterman algorithm is useful for improving the success rate of biomedical text retrieval. © 2004 Elsevier Ltd. All rights reserved.
Source Title:	Computers in Biology and Medicine
URI:	http://scholarbank.nus.edu.sg/handle/10635/104737
ISSN:	00104825
DOI:	10.1016/j.compbiomed.2004.06.002
Appears in Collections:	Staff Publications

Show full item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Altmetric

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.