SELF-ENHANCED VOCABULARY LEARNING LATENT DIRECHLET ALLOCATION

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/238651

Title:	SELF-ENHANCED VOCABULARY LEARNING LATENT DIRECHLET ALLOCATION
Authors:	HUANG YI HSIANG
Keywords:	tweets, LDA, BTM, NLP, topic, text
Issue Date:	31-Aug-2022
Citation:	HUANG YI HSIANG (2022-08-31). SELF-ENHANCED VOCABULARY LEARNING LATENT DIRECHLET ALLOCATION. ScholarBank@NUS Repository.
Abstract:	Many studies have been conducted to identify valuable information with sentiments and classifications. In the latter, topics models such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003) and Biterm Topic Model (BTM) (Yan et al., 2013) are conventional probabilistic models designed to unveil latent topic structure within texts. In the paper, we proposed a novel topic model to process short texts called Self-Enhanced Vocabulary Learning Latent Dirichlet Allocation (SVL-LDA), which is an extension from LDA by incorporating rolling window training of parameters of prior distribution to handle the sparsity problem. Especially, a larger base of information is utilized to finetune the corpus-level parameters. Then the LDA module will identify the coherence of topics-documents and word-topics. Empirical results show that our approach appears to outperform the baseline methods under certain conditions and demonstrate the practical application to social media.
URI:	https://scholarbank.nus.edu.sg/handle/10635/238651
Appears in Collections:	Master's Theses (Open)

File	Description	Size	Format	Access Settings	Version
HuangYH.pdf		1.6 MB	Adobe PDF	OPEN	None	View/Download

Check