Please use this identifier to cite or link to this item: http://scholarbank.nus.edu.sg/handle/10635/73709
Title: On the effectiveness of latent semantic analysis for the categorization of call centre records
Authors: Menon, R.
Keerthi, S.S. 
Loh, H.T. 
Brombacher, A.C.
Keywords: Call Centre Records
Latent Semantic Analysis
Singular Value Decomposition
Support Vector Machines
Text Classification
Issue Date: 2004
Source: Menon, R.,Keerthi, S.S.,Loh, H.T.,Brombacher, A.C. (2004). On the effectiveness of latent semantic analysis for the categorization of call centre records. IEEE International Engineering Management Conference 2 : 546-550. ScholarBank@NUS Repository.
Abstract: Text categorization is an important component in many information management tasks such as real-time sorting of emails or files. An important consideration in text categorization performance is the choice of feature sets for text representation. A popular approach for text representation is the vector space model. It represents the 'units of content' of a document as a vector. In most situations, each distinct word is used as a content unit. However, such a representation, called the bag-of-word approach has drawbacks. Firstly, a large number of features are required for document representation. Secondly, it does not take into account the effects of synonymy and polysemy, which could have an impact on classification accuracy. Latent semantic analysis addresses the above shortcomings by simultaneously modelling all the interrelationships among terms and documents, using the singular value decomposition technique which allows the representation of the terms and documents in a reduced dimensional space. It has been widely used to enhance the performance of information retrieval systems and recently used for text classification purposes as well. In this study, we further explore its use, for the classification of call centre data sets obtained from a Multi-National Company. These spontaneously created documents exhibit characteristics different from benchmark data sets used in most studies, hence necessitating this investigation. Further, the effect on classification, of various weighting schemes as well as the number of dimensions was explored. Results revealed that the LSA approach marginally improved the classification accuracy. It was also found that the weighting scheme used did not significantly affect classification performance unlike in some retrieval applications where as much as a 40% average improvement in performance was observed. Further, the widely recommended use of 100 to 300 dimensions for document representation was found to be inapplicable for the investigated data sets. © 2004 IEEE.
Source Title: IEEE International Engineering Management Conference
URI: http://scholarbank.nus.edu.sg/handle/10635/73709
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Page view(s)

8
checked on Dec 16, 2017

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.