Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/15849
Title: Lattice-based statistical spoken document retrieval
Authors: CHIA TEE KIAH @ XIE ZHIJIA
Keywords: information retrieval, language modeling, lattice-based spoken document retrieval, probabilistic IR, conversational telephone speech, query by example
Issue Date: 19-May-2009
Citation: CHIA TEE KIAH @ XIE ZHIJIA (2009-05-19). Lattice-based statistical spoken document retrieval. ScholarBank@NUS Repository.
Abstract: Recent research efforts on spoken document retrieval (SDR) have tried to overcome the low quality of 1-best automatic speech recognition transcripts -- especially for conversational speech -- by using statistics derived from speech lattices containing multiple transcription hypotheses as output by a speech recognizer. However, these efforts have invariably used the classical vector space retrieval model. In this thesis, I present a lattice-based SDR method based on a statistical approach to information retrieval. I formulate a way to estimate statistical models for documents from expected word counts derived from lattices; query-document relevance is computed as a log probability under such models. Experiments show that my method outperforms statistical retrieval using 1-best transcripts, a recent lattice-based vector space method, and BM25 using lattice statistics. I also extend my proposed SDR method to the task of query-by-example SDR -- retrieving documents from a speech corpus, where the queries are themselves full-fledged spoken documents (query exemplars).
URI: http://scholarbank.nus.edu.sg/handle/10635/15849
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
ChiaTK.pdf925.67 kBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.