Please use this identifier to cite or link to this item: http://scholarbank.nus.edu.sg/handle/10635/14302
Title: Information extraction from dynamic web sources
Authors: ROSHNI MOHAPATRA
Keywords: Web Mining, Information Extraction, Machine Learning, Information Agents, Wrappers, Wrapper Maintenance
Issue Date: 19-Oct-2004
Source: ROSHNI MOHAPATRA (2004-10-19). Information extraction from dynamic web sources. ScholarBank@NUS Repository.
Abstract: This thesis investigates wrapper induction from web sites whose layout may change over time. We formulate the reinduction problem and identify that wrapper induction from an incomplete label is a key problem to be solved. We propose a novel algorithm for incrementally inducing LR wrappers and show that this algorithm asymptotically identifies the correct wrapper as the number of tuples is increased. This property is used to propose a LR wrapper reinduction algorithm. This algorithm requires examples to be provided exactly once and thereafter the algorithm can detect the layout changes and reinduce wrappers automatically, so long as the wrapper changes are in LR. Our experimental studies demonstrate that the reinduction algorithm is able to achieve near perfect performance.
URI: http://scholarbank.nus.edu.sg/handle/10635/14302
Appears in Collections:Master's Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
Roshni Mohapatra-HT027101N-Information Extraction from Dynamic Web sources.pdf1.02 MBAdobe PDF

OPEN

NoneView/Download
abstract.pdf20.57 kBAdobe PDF

OPEN

NoneView/Download

Page view(s)

233
checked on Dec 18, 2017

Download(s)

331
checked on Dec 18, 2017

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.