Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/53707
Title: Interpreting Time in Text Summarizing Text with Time
Authors: NG JUN PING
Keywords: temporal, timeline, information extraction, summarization, crowdsourcing, tree kernel
Issue Date: 30-Dec-2013
Citation: NG JUN PING (2013-12-30). Interpreting Time in Text Summarizing Text with Time. ScholarBank@NUS Repository.
Abstract: In this thesis, I study two key steps in building a logical representation of temporal information --- a timeline --- found within text from newswire articles: 1) intra-sentence event-timex (E-T) temporal relationship classification, and 2) article-wide event-event (E-E) temporal relationship classification. Events and time expressions (timexes) are basic units of temporal information in text. These two steps allow us to build an understanding of the relative ordering between these basic temporal units. For both of these classification tasks, I propose more semantically motivated features, namely the use of typed dependency parses and discourse analyses, to achieve better classification performance. This is in contrast to much work in the existing literature, which have focused on lexico-syntactic features. Working on E-T temporal relationship classification, I also show that crowdsourcing is a very cost-effective and viable avenue through which a high-quality temporal corpus can be built. Making use of the structure of a sentence, I propose a unique way to identify instances which are computationally and cognitively easier. Excluding these instances from a corpus does not degrade subsequent classifier performance significantly. This allows cost savings of up to 37% when building a E-T temporal corpus. Besides putting together a state-of-the-art temporal processing system, this thesis also validates the efficacy and utility of the timelines that are automatically derived. Temporal information from these timelines is incorporated into a competitive baseline multi-document summarization system. I propose several features derived from timelines and show that they lead to a 4.1% improvement in summarization performance. I also introduce a modification to the traditional Maximal Marginal Relevance (MMR) algorithm, TimeMMR. TimeMMR is shown to be useful in the summarization of some document sets. To further improve the performance gains derived from the use of temporal information, I propose a reliability filtering metric which gauges how accurate and useful a timeline is. By selectively making use of timelines guided by this reliability filtering metric, overall summarization performance is increased by a statistically significant 5.9%.
URI: http://scholarbank.nus.edu.sg/handle/10635/53707
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
Thesis_v20140509.pdf1.2 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.