Combining Speech with textual methods for arabic diacritization

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/34443

Title:	Combining Speech with textual methods for arabic diacritization
Authors:	AISHA SIDDIQA AZIM
Keywords:	Arabic, NLP, speech,diacritics,interpolation,multi-modal
Issue Date:	20-Jan-2012
Citation:	AISHA SIDDIQA AZIM (2012-01-20). Combining Speech with textual methods for arabic diacritization. ScholarBank@NUS Repository.
Abstract:	The majority of studies on Arabic diacritization have employed textually inferred features alone. This thesis proposes a novel approach, where the weighted combination of speech with a text-based model is used to allow linguistically-insensitive acoustic information to correct and complement the errors generated by the text model's diacritic predictions. The acoustic model is based on Hidden Markov Models and the textual model on Conditional Random Fields. The combination brings significant reduction in error rates across all metrics, especially in case endings, which are the most difficult to predict. It gives results superior to those of conventional methods, with diacritic and word error rates of 1.6 and 5.2 inclusive of case endings, and 1.0 and 3.0 exclusive of them. Additionally, an interesting comparison is made between the diacritized solutions provided by two of the most popular morphological tools in the field of Arabic NLP, in the context of our combined system.
URI:	http://scholarbank.nus.edu.sg/handle/10635/34443
Appears in Collections:	Master's Theses (Open)

File	Description	Size	Format	Access Settings	Version
AishaSA_MScThesis.pdf		2.39 MB	Adobe PDF	OPEN	None	View/Download

Check