Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/78109
DC Field | Value | |
---|---|---|
dc.title | Dynamic conditional random fields for joint sentence boundary and punctuation prediction | |
dc.contributor.author | Wang, X. | |
dc.contributor.author | Ng, H.T. | |
dc.contributor.author | Sim, K.C. | |
dc.date.accessioned | 2014-07-04T03:12:32Z | |
dc.date.available | 2014-07-04T03:12:32Z | |
dc.date.issued | 2012 | |
dc.identifier.citation | Wang, X.,Ng, H.T.,Sim, K.C. (2012). Dynamic conditional random fields for joint sentence boundary and punctuation prediction. 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 2 : 1382-1385. ScholarBank@NUS Repository. | |
dc.identifier.isbn | 9781622767595 | |
dc.identifier.uri | http://scholarbank.nus.edu.sg/handle/10635/78109 | |
dc.description.abstract | The use of dynamic conditional random fields (DCRF) has been shown to outperform linear-chain conditional random fields (LCRF) for punctuation prediction on conversational speech texts [1]. In this paper, we combine lexical, prosodic, and modified n-gram score features into the DCRF framework for a joint sentence boundary and punctuation prediction task on TDT3 English broadcast news. We show that the joint prediction method outperforms the conventional two-stage method using LCRF or maximum entropy model (MaxEnt). We show the importance of various features using DCRF, LCRF, Max-Ent, and hidden-event n-gram model (HEN) respectively. In addition, we address the practical issue of feature explosion by introducing lexical pruning, which reduces model size and improves the F1-measure. We adopt incremental local training to overcome memory size limitation without incurring significant performance penalty. Our results show that adding prosodic and n-gram score features gives about 20% relative error reduction in all cases. Overall, DCRF gives the best accuracy, followed by LCRF, MaxEnt, and HEN. | |
dc.source | Scopus | |
dc.subject | Dynamic conditional random fields | |
dc.subject | Punctuation | |
dc.subject | Sentence boundary detection | |
dc.type | Conference Paper | |
dc.contributor.department | COMPUTER SCIENCE | |
dc.description.sourcetitle | 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 | |
dc.description.volume | 2 | |
dc.description.page | 1382-1385 | |
dc.identifier.isiut | NOT_IN_WOS | |
Appears in Collections: | Staff Publications |
Show simple item record
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.