Please use this identifier to cite or link to this item:
Title: Topic detection using maximal frequent sequences
Keywords: Topic Detection, Document Clustering, Maximal Frequent Sequences, Equivalence Classes
Issue Date: 18-Mar-2005
Citation: YAP YANG LENG, IVAN (2005-03-18). Topic detection using maximal frequent sequences. ScholarBank@NUS Repository.
Abstract: When analyzing document collections, a key detail is the number of distinct topics contained within. Traditional clustering-based methods that perform topic detection do not take into account word sequence, and usually are not equipped to describe topic content. We present a new method to address this; we use Maximal Frequent word Sequences (MFSs) as building blocks in identifying distinct topics. Our method is a hybrid of an existing algorithm to discover equivalence classes containing MFSs, and a heuristic to group equivalence classes into topic clusters. The results of applying our method to a collection of newswire articles and Manufacturing technical paper abstracts suggest that our method favors datasets whose topics are specific and conceptually well separated. Our method is also useful in generating a list of distinct topics, from a dataset whose topics are not clearly defined, which acts as an intermediate result to understanding and further partitioning the dataset.
Appears in Collections:Master's Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
YapYLI.pdf359.3 kBAdobe PDF



Page view(s)

checked on May 22, 2019


checked on May 22, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.