Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/15849
Title: | Lattice-based statistical spoken document retrieval | Authors: | CHIA TEE KIAH @ XIE ZHIJIA | Keywords: | information retrieval, language modeling, lattice-based spoken document retrieval, probabilistic IR, conversational telephone speech, query by example | Issue Date: | 19-May-2009 | Citation: | CHIA TEE KIAH @ XIE ZHIJIA (2009-05-19). Lattice-based statistical spoken document retrieval. ScholarBank@NUS Repository. | Abstract: | Recent research efforts on spoken document retrieval (SDR) have tried to overcome the low quality of 1-best automatic speech recognition transcripts -- especially for conversational speech -- by using statistics derived from speech lattices containing multiple transcription hypotheses as output by a speech recognizer. However, these efforts have invariably used the classical vector space retrieval model. In this thesis, I present a lattice-based SDR method based on a statistical approach to information retrieval. I formulate a way to estimate statistical models for documents from expected word counts derived from lattices; query-document relevance is computed as a log probability under such models. Experiments show that my method outperforms statistical retrieval using 1-best transcripts, a recent lattice-based vector space method, and BM25 using lattice statistics. I also extend my proposed SDR method to the task of query-by-example SDR -- retrieving documents from a speech corpus, where the queries are themselves full-fledged spoken documents (query exemplars). | URI: | http://scholarbank.nus.edu.sg/handle/10635/15849 |
Appears in Collections: | Ph.D Theses (Open) |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
ChiaTK.pdf | 925.67 kB | Adobe PDF | OPEN | None | View/Download |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.