Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/244769
Title: | KEYWORD ASSISTED TOPIC MODELING OF CHINESE CENTRAL GOVERNMENT DOCUMENTS | Authors: | GAO WENHAN | ORCID iD: | orcid.org/0009-0005-5108-9919 | Keywords: | Topic Modelling, Text Mining, Text Analysis, LDA, Policy, Keywords, | Issue Date: | 31-May-2023 | Citation: | GAO WENHAN (2023-05-31). KEYWORD ASSISTED TOPIC MODELING OF CHINESE CENTRAL GOVERNMENT DOCUMENTS. ScholarBank@NUS Repository. | Abstract: | Topic modelling is a powerful tool for uncovering latent structures and patterns within unstructured text data. However, existing approaches such as Latent Dirichlet Allocation (LDA) may not fully exploit available document information or prior knowledge of the topic structure. In this work, we present a novel semi-supervised topic modelling framework based on the Keyword-Assisted Topic Model (KeyATM) that leverages seeded keywords to incorporate document covariate data and better control for underlying topic structures. We apply our framework to a corpus of Chinese government documents, demonstrating its ability to identify meaningful words for various predetermined policy topics and to characterise document-topic distributions within different covariates for enhanced insights. Our framework also shows superior robustness to variations in initial Gibbs sampling starting points compared to conventional LDA, thanks to the guidance of the seeded keywords. These results highlight the potential of our approach for advancing topic modelling in real-world applications with complex data. | URI: | https://scholarbank.nus.edu.sg/handle/10635/244769 |
Appears in Collections: | Master's Theses (Open) |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
GaoWH.pdf | 1.9 MB | Adobe PDF | OPEN | None | View/Download |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.