Please use this identifier to cite or link to this item:
|Title:||Dynamic conditional random fields for joint sentence boundary and punctuation prediction||Authors:||Wang, X.
|Keywords:||Dynamic conditional random fields
Sentence boundary detection
|Issue Date:||2012||Citation:||Wang, X.,Ng, H.T.,Sim, K.C. (2012). Dynamic conditional random fields for joint sentence boundary and punctuation prediction. 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 2 : 1382-1385. ScholarBank@NUS Repository.||Abstract:||The use of dynamic conditional random fields (DCRF) has been shown to outperform linear-chain conditional random fields (LCRF) for punctuation prediction on conversational speech texts . In this paper, we combine lexical, prosodic, and modified n-gram score features into the DCRF framework for a joint sentence boundary and punctuation prediction task on TDT3 English broadcast news. We show that the joint prediction method outperforms the conventional two-stage method using LCRF or maximum entropy model (MaxEnt). We show the importance of various features using DCRF, LCRF, Max-Ent, and hidden-event n-gram model (HEN) respectively. In addition, we address the practical issue of feature explosion by introducing lexical pruning, which reduces model size and improves the F1-measure. We adopt incremental local training to overcome memory size limitation without incurring significant performance penalty. Our results show that adding prosodic and n-gram score features gives about 20% relative error reduction in all cases. Overall, DCRF gives the best accuracy, followed by LCRF, MaxEnt, and HEN.||Source Title:||13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012||URI:||http://scholarbank.nus.edu.sg/handle/10635/78109||ISBN:||9781622767595|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Oct 12, 2019
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.