Please use this identifier to cite or link to this item:
Authors: KANG WEI
Keywords: Social Media Content, Summarization, Hierarchical Tag Cloud, Hierarchical Summary, Visualization, Social Media Analytics
Issue Date: 26-Feb-2015
Citation: KANG WEI (2015-02-26). ANALYZING SOCIAL MEDIA CONTENTS. ScholarBank@NUS Repository.
Abstract: The proliferation of social media services has led to the production of huge amounts of data, which raises great challenges to information acquisition, integration and digestion. To extract compact yet useful information, many algorithms have been proposed to summarize social media contents, e.g., tweets and news feeds. However, it remains challenging to extract summaries efficiently and support the interactive exploration of such data. Most existing methods also extract summaries without considering the semantic meanings and relationships in those summaries. Even with the extracted information, users may still find it hard to obtain knowledge in conformity with their preferences. To tackle these challenges, we propose two novel summarization approaches in this thesis to generating hierarchical summaries. One approach generates summaries from spatiotemporal social media contents and builds a system to visualize the summaries in hierarchical tag clouds. The other approach focuses on introducing semantics into each summary. In addition, a system with four data analytics tools is built to manage social media contents and extracted knowledge via Wikipedia. Specifically, we first propose Vesta which enables users to extract and interactively explore summaries of social media contents published in a certain spatiotemporal range. These summaries are represented using a novel concept called hierarchical tag clouds, which allows users to zoom in/out to explore more specific/general tag summaries. A novel biclustering approach is proposed to extract summaries, from which topic hierarchies are generated for partitions of data. At runtime, topic hierarchies in certain partitions are merged to form tag hierarchies, which are used to construct hierarchical tag clouds for visualization. Next, we propose Heron to generate hierarchical summaries from any set of social media contents. It makes use of the DBpedia ontology, through which semantically hierarchical relationships are introduced into each summary. Specifically, a summary consists of a set of semantically related Wikipedia entities which are extracted from social media contents. The entities are further classified into different subsets, which are mapped to the corresponding classes in a sub-hierarchy of the DBpedia ontology to reveal subsumptive relationships. We propose a model named multi-level Naive Bayes Classifiers to refine the classes of entities so as to reduce inaccuracies and inconsistency in Wikipedia. Considering the probability that many entities may be mapped to a single class, we further propose to select the top-ranked entities for each subset of a summary. Finally, we present a novel system named Trendspedia, which brings proper context to continuously incoming social media contents, so that massive amounts of information can be indexed, organized and analyzed around Wikipedia entities. Four data analytics tools are employed. With this system, users can easily pinpoint valuable information and knowledge, and navigate to other closely related entities through an information network for further exploration. Extensive experimental studies have verified the efficiency, effectiveness and scalability of our approaches. We believe that our summarization approaches, as well as the Trendspedia system, can greatly promote and facilitate the exploration of insights hidden in huge numbers of social media contents.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
PhD_Thesis_KangWei_HT091860U_Final-signed.pdf3.28 MBAdobe PDF



Page view(s)

checked on Nov 28, 2020


checked on Nov 28, 2020

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.