Vocabulary filtering for termweighting in archived question search

Please use this identifier to cite or link to this item: https://doi.org/10.1007/978-3-642-13657-3_42

DC Field	Value
dc.title	Vocabulary filtering for termweighting in archived question search
dc.contributor.author	Ming, Z.-Y.
dc.contributor.author	Wang, K.
dc.contributor.author	Chua, T.-S.
dc.date.accessioned	2013-07-04T08:14:58Z
dc.date.available	2013-07-04T08:14:58Z
dc.date.issued	2010
dc.identifier.citation	Ming, Z.-Y.,Wang, K.,Chua, T.-S. (2010). Vocabulary filtering for termweighting in archived question search. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6118 LNAI (PART 1) : 383-390. ScholarBank@NUS Repository. <a href="https://doi.org/10.1007/978-3-642-13657-3_42" target="_blank">https://doi.org/10.1007/978-3-642-13657-3_42</a>
dc.identifier.isbn	3642136567
dc.identifier.issn	03029743
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/40900
dc.description.abstract	This paper proposes the notion of vocabulary filtering in a termweighting framework that consists of three filters at the document level, collection level, and vocabulary level. While term frequency and document frequency along with their variations are respectively the dominant term weighting factors at the document level and collection level, vocabulary level factors are seldom considered in current models. In a way, stopword removal can be seen as a vocabulary level filter, but it is not well integrated into the current term-weighting models. In this paper, we propose a vocabulary filtering and multi-level term weighting model by integrating point-wise divergence based measure into the commonly used TF-IDF model. With our proposed model, the specificity of the vocabulary is captured as a new factor in term weighting, and stopwords are naturally handled within the model rather than being removed according to a separately constructed list. Experiments conducted on searching for similar questions in a large community-based question answering archive show that: (a)our proposed term weighting model with multiple levels is consistently better than those with single level for retrieval task; (b)the proposed vocabulary filter well distinguishes salient and trivial terms, and can be utilized to construct stopword lists. © 2010 Springer-Verlag Berlin Heidelberg.
dc.description.uri	http://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1007/978-3-642-13657-3_42
dc.source	Scopus
dc.type	Conference Paper
dc.contributor.department	COMPUTER SCIENCE
dc.description.doi	10.1007/978-3-642-13657-3_42
dc.description.sourcetitle	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
dc.description.volume	6118 LNAI
dc.description.issue	PART 1
dc.description.page	383-390
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM