A unified tagging approach to text normalization | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/41982

Title:	A unified tagging approach to text normalization
Authors:	Zhu, C. Tang, J. Li, H. Ng, H.T. Zhao, T.
Issue Date:	2007
Citation:	Zhu, C.,Tang, J.,Li, H.,Ng, H.T.,Zhao, T. (2007). A unified tagging approach to text normalization. ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics : 688-695. ScholarBank@NUS Repository.
Abstract:	This paper addresses the issue of text normalization, an important yet often overlooked problem in natural language processing. By text normalization, we mean converting 'informally inputted' text into the canonical form, by eliminating 'noises' in the text and detecting paragraph and sentence boundaries in the text. Previously, text normalization issues were often undertaken in an ad-hoc fashion or studied separately. This paper first gives a formalization of the entire problem. It then proposes a unified tagging approach to perform the task using Conditional Random Fields (CRF). The paper shows that with the introduction of a small set of tags, most of the text normalization tasks can be performed within the approach. The accuracy of the proposed method is high, because the subtasks of normalization are interdependent and should be performed together. Experimental results on email data cleaning show that the proposed method significantly outperforms the approach of using cascaded models and that of employing independent models. © 2007 Association for Computational Linguistics.
Source Title:	ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics
URI:	http://scholarbank.nus.edu.sg/handle/10635/41982
ISBN:	9781932432862
Appears in Collections:	Staff Publications

Show full item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Altmetric

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.