Please use this identifier to cite or link to this item:
Title: Lattice-based statistical spoken document retrieval
Keywords: information retrieval, language modeling, lattice-based spoken document retrieval, probabilistic IR, conversational telephone speech, query by example
Issue Date: 19-May-2009
Citation: CHIA TEE KIAH @ XIE ZHIJIA (2009-05-19). Lattice-based statistical spoken document retrieval. ScholarBank@NUS Repository.
Abstract: Recent research efforts on spoken document retrieval (SDR) have tried to overcome the low quality of 1-best automatic speech recognition transcripts -- especially for conversational speech -- by using statistics derived from speech lattices containing multiple transcription hypotheses as output by a speech recognizer. However, these efforts have invariably used the classical vector space retrieval model. In this thesis, I present a lattice-based SDR method based on a statistical approach to information retrieval. I formulate a way to estimate statistical models for documents from expected word counts derived from lattices; query-document relevance is computed as a log probability under such models. Experiments show that my method outperforms statistical retrieval using 1-best transcripts, a recent lattice-based vector space method, and BM25 using lattice statistics. I also extend my proposed SDR method to the task of query-by-example SDR -- retrieving documents from a speech corpus, where the queries are themselves full-fledged spoken documents (query exemplars).
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
ChiaTK.pdf925.67 kBAdobe PDF



Page view(s)

checked on Dec 9, 2018


checked on Dec 9, 2018

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.