Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/244769
Title: KEYWORD ASSISTED TOPIC MODELING OF CHINESE CENTRAL GOVERNMENT DOCUMENTS
Authors: GAO WENHAN
ORCID iD:   orcid.org/0009-0005-5108-9919
Keywords: Topic Modelling, Text Mining, Text Analysis, LDA, Policy, Keywords,
Issue Date: 31-May-2023
Citation: GAO WENHAN (2023-05-31). KEYWORD ASSISTED TOPIC MODELING OF CHINESE CENTRAL GOVERNMENT DOCUMENTS. ScholarBank@NUS Repository.
Abstract: Topic modelling is a powerful tool for uncovering latent structures and patterns within unstructured text data. However, existing approaches such as Latent Dirichlet Allocation (LDA) may not fully exploit available document information or prior knowledge of the topic structure. In this work, we present a novel semi-supervised topic modelling framework based on the Keyword-Assisted Topic Model (KeyATM) that leverages seeded keywords to incorporate document covariate data and better control for underlying topic structures. We apply our framework to a corpus of Chinese government documents, demonstrating its ability to identify meaningful words for various predetermined policy topics and to characterise document-topic distributions within different covariates for enhanced insights. Our framework also shows superior robustness to variations in initial Gibbs sampling starting points compared to conventional LDA, thanks to the guidance of the seeded keywords. These results highlight the potential of our approach for advancing topic modelling in real-world applications with complex data.
URI: https://scholarbank.nus.edu.sg/handle/10635/244769
Appears in Collections:Master's Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
GaoWH.pdf1.9 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.