Please use this identifier to cite or link to this item:
|Title:||Compactly supported basis functions as support vector kernels: Capturing feature interdependence in the embedding space||Authors:||PETER WITTEK||Keywords:||kernel methods, support vector machines, feature engineering, wavelets, text classification, semantic kernels||Issue Date:||7-Apr-2010||Citation:||PETER WITTEK (2010-04-07). Compactly supported basis functions as support vector kernels: Capturing feature interdependence in the embedding space. ScholarBank@NUS Repository.||Abstract:||have a negative impact on the overall effectiveness of a machine learning algorithm. Numerous methods have been developed to choose the most important features based on the statistical properties of features (feature selection) or based on the effectiveness of the learning algorithm (feature wrappers). Feature extraction, on the other hand, aims to create a new, smaller set of features by using relationship between variables in the original set. In any of these approaches, reducing the number of features may also increase the speed of the learning process, however, kernel methods are able to deal with very high number of features efficiently. This thesis proposes a kernel method which keeps all the features and uses the relationship between them to improve effectiveness. The broader framework is defined by wavelet kernels. Wavelet kernels have been introduced for both support vector regression and classification. Most of these wavelet kernels do not use the inner product of the embedding space, but use wavelets in a similar fashion to radial basis function kernels. Wavelet analysis is typically carried out on data with a temporal or spatial relation between consecutive data points. The new kernel requires the feature set to be ordered, such that consecutive features are related either statistically or based on some external knowledge source; this relation is meant to act in a similar way as the temporal or spatial relation on other domains. The thesis proposes an algorithm which performs this ordering. The ordered feature set enables to interpret the vector representation of an object as a series of equally spaced observations of a hypothetical continuous signal. The new kernel maps the vector representation of objects to the L2 function space, where appropriately chosen compactly supported basis functions utilize the relation between features when calculating the similarity between two objects. Experiments on general-domain data sets show that the proposed kernel is able to outperform baseline kernels with statistical significance if there are many relevant features, and these features are strongly or loosely correlated. This is the typical case for textual data sets. The suggested approach is not entirely new to text representation. In order to be efficient, the mathematical objects of a formal model, like vectors, have to reasonably approximate language-related phenomena such as word meaning inherent in index terms. On the other hand, the classical model of text representation, when it comes to the representation of word meaning, is approximate only. Adding expansion terms to the vector representation can also improve effectiveness. The choice of expansion terms is either based on distributional similarity or on some lexical resource that establishes relationships between terms. Existing methods regard all expansion terms equally important. The proposed kernel, however, discounts less important expansion terms according to a semantic similarity distance. This approach improves effectiveness in both text classification and information retrieval.||URI:||http://scholarbank.nus.edu.sg/handle/10635/18828|
|Appears in Collections:||Ph.D Theses (Open)|
Show full item record
Files in This Item:
|Wittek.pdf||1.18 MB||Adobe PDF|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.