Please use this identifier to cite or link to this item:
Title: Combining Speech with textual methods for arabic diacritization
Keywords: Arabic, NLP, speech,diacritics,interpolation,multi-modal
Issue Date: 20-Jan-2012
Citation: AISHA SIDDIQA AZIM (2012-01-20). Combining Speech with textual methods for arabic diacritization. ScholarBank@NUS Repository.
Abstract: The majority of studies on Arabic diacritization have employed textually inferred features alone. This thesis proposes a novel approach, where the weighted combination of speech with a text-based model is used to allow linguistically-insensitive acoustic information to correct and complement the errors generated by the text model's diacritic predictions. The acoustic model is based on Hidden Markov Models and the textual model on Conditional Random Fields. The combination brings significant reduction in error rates across all metrics, especially in case endings, which are the most difficult to predict. It gives results superior to those of conventional methods, with diacritic and word error rates of 1.6 and 5.2 inclusive of case endings, and 1.0 and 3.0 exclusive of them. Additionally, an interesting comparison is made between the diacritized solutions provided by two of the most popular morphological tools in the field of Arabic NLP, in the context of our combined system.
Appears in Collections:Master's Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
AishaSA_MScThesis.pdf2.39 MBAdobe PDF



Page view(s)

checked on Dec 16, 2018


checked on Dec 16, 2018

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.