Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/27886
Title: | Step: Set of T-uples expansion using the web | Authors: | LIU YUGANG | Keywords: | set expansion, wrapper construction, ranking candidates | Issue Date: | 5-Aug-2011 | Citation: | LIU YUGANG (2011-08-05). Step: Set of T-uples expansion using the web. ScholarBank@NUS Repository. | Abstract: | Set expansion is the task of finding members of a semantic class, the set, given a small subset of its members, the seeds. Set expansion systems have leveraged the explosion of the number of HTML formatted lists of all sorts and kinds on the World Wide Web. Such syntactical set expansion from the Web works particularly well for the expansion of sets of atomic values. In this thesis, we present STEP, a set of t-uples expansion system. STEP extends the SEAL set expansion system [Wang 2007] to the expansion of set of t-uples, or relations as in Codd?s relational model. The generalization from sets of atomic values expansion to set of t-uples expansion raises problems at every stage of the expansion process, mainly, location of the sources, wrapper (specific contexts that bracket the seeds) construction and extraction of candidates, and ranking of candidates. We therefore argue that set of t-uples expansion compels extensions to the existing expansion process as proposed by many solutions including SEAL. We show that set of t-uples expansion can be achieved effectively by: (i) making the wrappers more flexible, (ii) expanding the search to more pages, in particular to the collections of pages that belong to a same website as t-uples may be located on multiple pages rather than on a same page, and (iii) considering more entities, such as domains, to improve the ranking of candidates. We empirically evaluate the performance of STEP. We compare the successive techniques that we introduce with the baselines provided by SEAL and show significant improvement. Besides, we also study different factors that can affect the performance of STEP and offer some constructive suggestions. | URI: | http://scholarbank.nus.edu.sg/handle/10635/27886 |
Appears in Collections: | Master's Theses (Open) |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
LIUYugang.pdf | 2.09 MB | Adobe PDF | OPEN | None | View/Download |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.