Please use this identifier to cite or link to this item: http://scholarbank.nus.edu.sg/handle/10635/14686
Title: Topic detection using maximal frequent sequences
Authors: YAP YANG LENG, IVAN
Keywords: Topic Detection, Document Clustering, Maximal Frequent Sequences, Equivalence Classes
Issue Date: 18-Mar-2005
Source: YAP YANG LENG, IVAN (2005-03-18). Topic detection using maximal frequent sequences. ScholarBank@NUS Repository.
Abstract: When analyzing document collections, a key detail is the number of distinct topics contained within. Traditional clustering-based methods that perform topic detection do not take into account word sequence, and usually are not equipped to describe topic content. We present a new method to address this; we use Maximal Frequent word Sequences (MFSs) as building blocks in identifying distinct topics. Our method is a hybrid of an existing algorithm to discover equivalence classes containing MFSs, and a heuristic to group equivalence classes into topic clusters. The results of applying our method to a collection of newswire articles and Manufacturing technical paper abstracts suggest that our method favors datasets whose topics are specific and conceptually well separated. Our method is also useful in generating a list of distinct topics, from a dataset whose topics are not clearly defined, which acts as an intermediate result to understanding and further partitioning the dataset.
URI: http://scholarbank.nus.edu.sg/handle/10635/14686
Appears in Collections:Master's Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
YapYLI.pdf359.3 kBAdobe PDF

OPEN

NoneView/Download

Page view(s)

204
checked on Dec 11, 2017

Download(s)

236
checked on Dec 11, 2017

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.