Please use this identifier to cite or link to this item:
Creators: Farseev, A.
Chua Tat Seng 
External Contact: Aleksandr Farseev
Subject: Demographic profile
Mobility profile
Multiple source integration
User profile learning
DOI: doi:10.25540/S4XV-0TK9

With the rapid growth of multi-source social media resources, comprehensive user profile learning serves as an actual backbone in various application domains. Such user profile components as user mobility and user demography describe social media users from different views. However, there was no much research done on multi-source multimodal user profile learning. Moreover, there is not any benchmark dataset released towards user mobility and demographic profiling.

Here we introduce a multi-source dataset created by Lab for Media Search in National University of Singapore. The dataset includes six types of features extracted from these data, including location semantics features, location semantics LDA-based features, text LDA-based features, text LIWC features, sentiment and writing style features, ImageNet image concept features; and ground-truth data from three geographical regions: Singapore, New York, and London. In order to cover the most popular data modalities (visual, textual and location data), we incorporate following social media sources: Foursquare (the largest location based social network) as a location data source; Twitter (microblog service with the biggest English-speaking users base) as a textual data source; Instagram (The most popular photo sharing service) as a visual data source and Facebook as a ground truth source. We also provide the baseline results for user Demographic profiling by learning from the text, image and location data using the ensemble model. The benchmark results show that it is possible to learn models from these data aiming to improve user profile learning. Please check more details about user profile learning and features description from slides.

Our dataset can be used for both descriptive and prescriptive research. That is to say, we do not intend to constraint future research on user profile learning, since the available ground truth provides possibility to tackle other contemporary problems. We list some potential research topics that can be conducted on our released dataset:

  • Complete demographic profiling. Researchers are encouraged to learn other demographics attributes, such as occupation, personality and social status.
  • Extended mobility profiling. In current study, we focused on category-specific user mobility profiling; while it would be useful to incorporate spatio-temporal factors of users' movement
  • Causality patterns extraction. It is important to discover potential causal relationships between events from multiple data sources. For example, the "flower" image concept could be temporally related with flower shop check-ins or tweets about flowers.
  • Causality patterns extraction.Cross-source user identification. The alignment of user accounts across multiple social resources can benefit from user profile compilation
  • Causality patterns extraction.Cross-region user profiling and community matching. This direction may over insight on differences and similarities between users' preferences.

For more details of this dataset and to reuse this dataset, please visit

Related Publications:
Citation: When using this data, please cite the original publication and also the dataset.
  • A. Farseev, N. Liqiang, M. Akbari, and T.-S. Chua. Harvesting multiple sources for user profile learning: a Big data study. ACM International Conference on Multimedia Retrieval (ICMR). China. June 23-26, 2015.
  • Farseev, A., Chua Tat Seng (2017-11-13). NUS MULTI-SOURCE Social DATASET (NUS-MSS). ScholarBank@NUS Repository. [Dataset].
Appears in Collections:Staff Dataset

Show full item record
Files in This Item:
There are no files associated with this item.

Page view(s)

checked on Aug 16, 2019

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.