Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/238651
Title: SELF-ENHANCED VOCABULARY LEARNING LATENT DIRECHLET ALLOCATION
Authors: HUANG YI HSIANG
Keywords: tweets, LDA, BTM, NLP, topic, text
Issue Date: 31-Aug-2022
Citation: HUANG YI HSIANG (2022-08-31). SELF-ENHANCED VOCABULARY LEARNING LATENT DIRECHLET ALLOCATION. ScholarBank@NUS Repository.
Abstract: Many studies have been conducted to identify valuable information with sentiments and classifications. In the latter, topics models such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003) and Biterm Topic Model (BTM) (Yan et al., 2013) are conventional probabilistic models designed to unveil latent topic structure within texts. In the paper, we proposed a novel topic model to process short texts called Self-Enhanced Vocabulary Learning Latent Dirichlet Allocation (SVL-LDA), which is an extension from LDA by incorporating rolling window training of parameters of prior distribution to handle the sparsity problem. Especially, a larger base of information is utilized to finetune the corpus-level parameters. Then the LDA module will identify the coherence of topics-documents and word-topics. Empirical results show that our approach appears to outperform the baseline methods under certain conditions and demonstrate the practical application to social media.
URI: https://scholarbank.nus.edu.sg/handle/10635/238651
Appears in Collections:Master's Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
HuangYH.pdf1.6 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.