Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/244769
DC Field | Value | |
---|---|---|
dc.title | KEYWORD ASSISTED TOPIC MODELING OF CHINESE CENTRAL GOVERNMENT DOCUMENTS | |
dc.contributor.author | GAO WENHAN | |
dc.date.accessioned | 2023-08-31T18:00:29Z | |
dc.date.available | 2023-08-31T18:00:29Z | |
dc.date.issued | 2023-05-31 | |
dc.identifier.citation | GAO WENHAN (2023-05-31). KEYWORD ASSISTED TOPIC MODELING OF CHINESE CENTRAL GOVERNMENT DOCUMENTS. ScholarBank@NUS Repository. | |
dc.identifier.uri | https://scholarbank.nus.edu.sg/handle/10635/244769 | |
dc.description.abstract | Topic modelling is a powerful tool for uncovering latent structures and patterns within unstructured text data. However, existing approaches such as Latent Dirichlet Allocation (LDA) may not fully exploit available document information or prior knowledge of the topic structure. In this work, we present a novel semi-supervised topic modelling framework based on the Keyword-Assisted Topic Model (KeyATM) that leverages seeded keywords to incorporate document covariate data and better control for underlying topic structures. We apply our framework to a corpus of Chinese government documents, demonstrating its ability to identify meaningful words for various predetermined policy topics and to characterise document-topic distributions within different covariates for enhanced insights. Our framework also shows superior robustness to variations in initial Gibbs sampling starting points compared to conventional LDA, thanks to the guidance of the seeded keywords. These results highlight the potential of our approach for advancing topic modelling in real-world applications with complex data. | |
dc.language.iso | en | |
dc.subject | Topic Modelling, Text Mining, Text Analysis, LDA, Policy, Keywords, | |
dc.type | Thesis | |
dc.contributor.department | STATISTICS AND DATA SCIENCE | |
dc.contributor.supervisor | David John Nott | |
dc.contributor.supervisor | Ying Chen | |
dc.description.degree | Master's | |
dc.description.degreeconferred | MASTER OF SCIENCE (RSH-FOS) | |
dc.identifier.orcid | 0009-0005-5108-9919 | |
Appears in Collections: | Master's Theses (Open) |
Show simple item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
GaoWH.pdf | 1.9 MB | Adobe PDF | OPEN | None | View/Download |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.