Combining text and audio-visual features in video indexing

Please use this identifier to cite or link to this item: https://doi.org/10.1109/ICASSP.2005.1416476

Title:	Combining text and audio-visual features in video indexing
Authors:	Chang, S.-F. Manmatha, R. Chua, T.-S.
Issue Date:	2005
Citation:	Chang, S.-F.,Manmatha, R.,Chua, T.-S. (2005). Combining text and audio-visual features in video indexing. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings V : V1005-V1008. ScholarBank@NUS Repository. https://doi.org/10.1109/ICASSP.2005.1416476
Abstract:	We discuss the opportunities, state of the art, and open research issues in using multi-modal features in video indexing. Specifically, we focus on how imperfect text data obtained by automatic speech recognition (ASR) may be used to help solve challenging problems, such as story segmentation, concept detection, retrieval, and topic clustering. We review the frameworks and machine learning techniques that are used to fuse the text features with audio-visual features. Case studies showing promising performance will be described, primarily in the broadcast news video domain. © 2005 IEEE.
Source Title:	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
URI:	http://scholarbank.nus.edu.sg/handle/10635/41835
ISBN:	0780388747
ISSN:	15206149
DOI:	10.1109/ICASSP.2005.1416476
Appears in Collections:	Staff Publications

There are no files associated with this item.

Check