Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/113662
Title: Supporting the curation of biological databases with reusable text mining.
Authors: Miotto, O. 
Tan, T.W. 
Brusic, V.
Issue Date: 2005
Citation: Miotto, O.,Tan, T.W.,Brusic, V. (2005). Supporting the curation of biological databases with reusable text mining.. Genome informatics. International Conference on Genome Informatics. 16 (2) : 32-44. ScholarBank@NUS Repository.
Abstract: Curators of biological databases transfer knowledge from scientific publications, a laborious and expensive manual process. Machine learning algorithms can reduce the workload of curators by filtering relevant biomedical literature, though their widespread adoption will depend on the availability of intuitive tools that can be configured for a variety of tasks. We propose a new method for supporting curators by means of document categorization, and describe the architecture of a curator-oriented tool implementing this method using techniques that require no computational linguistic or programming expertise. To demonstrate the feasibility of this approach, we prototyped an application of this method to support a real curation task: identifying PubMed abstracts that contain allergen cross-reactivity information. We tested the performance of two different classifier algorithms (CART and ANN), applied to both composite and single-word features, using several feature scoring functions. Both classifiers exceeded our performance targets, the ANN classifier yielding the best results. These results show that the method we propose can deliver the level of performance needed to assist database curation.
Source Title: Genome informatics. International Conference on Genome Informatics.
URI: http://scholarbank.nus.edu.sg/handle/10635/113662
ISSN: 09199454
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.