Please use this identifier to cite or link to this item: https://doi.org/10.1109/ICTAI.2004.122
Title: XML clustering by principal component analysis
Authors: Liu, J.
Wang, J.T.L.
Hsu, W. 
Herbert, K.G.
Issue Date: 2004
Source: Liu, J.,Wang, J.T.L.,Hsu, W.,Herbert, K.G. (2004). XML clustering by principal component analysis. Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI : 658-662. ScholarBank@NUS Repository. https://doi.org/10.1109/ICTAI.2004.122
Abstract: XML is increasingly important in data exchange and information management. A large amount of efforts have been spent in developing efficient techniques for storing, querying, indexing and accessing XML documents. In this paper we propose a new approach to clustering XML data. In contrast to previous work, which focused on documents defined by different DTDs, the proposed method works for documents with the same DTD. Our approach is to extract features from documents, modeled by ordered labeled trees, and transform the documents to vectors in a high-dimensional Euclidean space based on the occurrences of the features in the documents. We then reduce the dimensionality of the vectors by principal component analysis (PCA) and cluster the vectors in the reduced dimensional space. The PCA enables one to identify vectors with co-occurrent features, thereby enhancing the accuracy of the clustering. Experimental results based on documents obtained from Wisconsin's XML data bank show the effectiveness and good performance of the proposed techniques. © 2004 IEEE.
Source Title: Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
URI: http://scholarbank.nus.edu.sg/handle/10635/40414
ISBN: 076952236X
ISSN: 10823409
DOI: 10.1109/ICTAI.2004.122
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

26
checked on Dec 11, 2017

Page view(s)

71
checked on Dec 16, 2017

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.