Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/14686
Title: | Topic detection using maximal frequent sequences | Authors: | YAP YANG LENG, IVAN | Keywords: | Topic Detection, Document Clustering, Maximal Frequent Sequences, Equivalence Classes | Issue Date: | 18-Mar-2005 | Citation: | YAP YANG LENG, IVAN (2005-03-18). Topic detection using maximal frequent sequences. ScholarBank@NUS Repository. | Abstract: | When analyzing document collections, a key detail is the number of distinct topics contained within. Traditional clustering-based methods that perform topic detection do not take into account word sequence, and usually are not equipped to describe topic content. We present a new method to address this; we use Maximal Frequent word Sequences (MFSs) as building blocks in identifying distinct topics. Our method is a hybrid of an existing algorithm to discover equivalence classes containing MFSs, and a heuristic to group equivalence classes into topic clusters. The results of applying our method to a collection of newswire articles and Manufacturing technical paper abstracts suggest that our method favors datasets whose topics are specific and conceptually well separated. Our method is also useful in generating a list of distinct topics, from a dataset whose topics are not clearly defined, which acts as an intermediate result to understanding and further partitioning the dataset. | URI: | http://scholarbank.nus.edu.sg/handle/10635/14686 |
Appears in Collections: | Master's Theses (Open) |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
YapYLI.pdf | 359.3 kB | Adobe PDF | OPEN | None | View/Download |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.