Please use this identifier to cite or link to this item:
Title: Kernel engineering on parse trees
Authors: SUN JUN
Keywords: Kernel Methods, Natural Language Processing, Tree Sequence, Structure Alignment, Relation Extraction, Machine Translation
Issue Date: 14-Jul-2011
Citation: SUN JUN (2011-07-14). Kernel engineering on parse trees. ScholarBank@NUS Repository.
Abstract: Recently, Natural Language Processing (NLP) has been greatly benefiting from the progress of machine learning methods in large data driven applications. Some NLP tasks require complex data representation to deeply analyze the syntactic and semantic features. In many cases the input data is represented as sequences, trees and even graphs. Traditional feature based methods transform these structured input data into vectorial representation by sophisticated feature engineering, which is argued infeasible to fully explore the structure features. Alternatively, kernel methods can explore a very high dimensional feature space for these complex input structures without explicitly representing the input data as a feature vector. In terms of tree structures, tree kernels can explore the subtree features in the parse trees, without explicitly enumerating each type of subtree. However, previous tree kernels explore the structure features with respect to the single subtree representation. The structure of the large single subtree may be sparse in the data set, which prevents large structures from being effectively utilized. Sometimes, only certain parts of a large subtree are beneficial instead of the entire subtree. In this case, using the entire structure may introduce noisy information. To address the above deficiency, this dissertation systematically investigates the phrase parse tree and attempts to design more sophisticated kernels to deeply explore the structure features embedded in the phrase parse trees other than the single subtree representation. Specifically, this dissertation proposes tree sequence based kernels which adopt the subtree sequence structure as the basic feature type to explore the structure features in phrase parse trees. A variety of kernels are built up based on the subtree sequence structure. The advantages of the subtree sequence structures are demonstrated on various NLP applications. By means of the tree (sequence) kernels over multiple parse trees, a kernel based alignment model is proposed for the task of bilingual subtree alignment, with which the translation performance can be effectively improved. On a more general perspective, this dissertation systematically explores the disconnected structure features in parse trees by means of kernels. On this point, this dissertation may provide novel views of structure features for NLP applications.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
thesis.pdf1.02 MBAdobe PDF



Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.