Topic detection using maximal frequent sequences

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/14686

Title:	Topic detection using maximal frequent sequences
Authors:	YAP YANG LENG, IVAN
Keywords:	Topic Detection, Document Clustering, Maximal Frequent Sequences, Equivalence Classes
Issue Date:	18-Mar-2005
Citation:	YAP YANG LENG, IVAN (2005-03-18). Topic detection using maximal frequent sequences. ScholarBank@NUS Repository.
Abstract:	When analyzing document collections, a key detail is the number of distinct topics contained within. Traditional clustering-based methods that perform topic detection do not take into account word sequence, and usually are not equipped to describe topic content. We present a new method to address this; we use Maximal Frequent word Sequences (MFSs) as building blocks in identifying distinct topics. Our method is a hybrid of an existing algorithm to discover equivalence classes containing MFSs, and a heuristic to group equivalence classes into topic clusters. The results of applying our method to a collection of newswire articles and Manufacturing technical paper abstracts suggest that our method favors datasets whose topics are specific and conceptually well separated. Our method is also useful in generating a list of distinct topics, from a dataset whose topics are not clearly defined, which acts as an intermediate result to understanding and further partitioning the dataset.
URI:	http://scholarbank.nus.edu.sg/handle/10635/14686
Appears in Collections:	Master's Theses (Open)

File	Description	Size	Format	Access Settings	Version
YapYLI.pdf		359.3 kB	Adobe PDF	OPEN	None	View/Download

Check