Please use this identifier to cite or link to this item: https://doi.org/10.25540/BN6N-4R2M
Title: NUS Multi-Sensor Presentation (NUSMSP) Dataset
Creators: Gan Tian
Wong, Yongkang 
Mandal, Bappaditya
Li, Junnan
Chandrasekhar, Vijay
KANKANHALLI,MOHAN S 
NUS Contact: Wong Yong Kang
External Contact: Gan Tian
Li Junnan
Subject: Quantified Self
Multi-modal Analysis
Presentations
Sensors
Egocentric Vision
Skeleton
Feature extraction
Machine learning
Speech recognition
Mel frequency cepstral coefficient
Robustness
DOI: 10.25540/BN6N-4R2M
Description: 

Oral presentation has been an effective method for delivering information to a group of participants for many years. In the past couple of decades, technological advancements have revolutionized the way humans deliver presentations. Unfortunately, due to a variety of reasons, the quality of presentations can be variable which can have an impact on its efficacy. Assessing the quality of a presentation usually requires painstaking manual analysis by experts. The expert feedback can definitely help people improve their presentation skills. Unfortunately, the manual evaluation of the presentation quality by experts is not cost effective and may not available to most people.

In this work, we collected a novel NUS Multi-Sensor Presentation (NUSMSP) Dataset, which contains 51 real-world presentations recorded in a multi-sensor environment. The NUSMSP Dataset was recorded between December 2014 and February 2015 at the National University of Singapore (NUS). The dataset is collected in a meeting room equipped with two static cameras (with built-in microphone), one Kinect depth sensor, and three Google Glasses. This dataset consists of 51 unique individuals (32 males and 19 females). Each subject was asked to prepare and deliver a 10 to 15 minutes presentation with no restriction on the topic. For each recording (presentation), the number of audience members ranged from 4 to 8. In total, we have about 10 hours of valid presentation data. Due to the unpredictable recording conditions, a small portion of data from the sensors failed to record the presentation.

For each presentation, the ambient Kinect depth sensor (denoted as AM-K) captured the speaker's behavior with RGBD data. A high resolution video recording the audiences' behavior was captured using an ambient static camera (denoted as AM-S 2) with a resolution of 1920x1080 at 30fps in MP4 format. Meanwhile, another ambient static camera (denoted as AM-S 2) captured the overview of both the speaker and audiences' behavior with the same specification. The speaker and two randomly chosen audience members were asked to wear a Google Glass. The Google Glass records the video with a resolution of 1280x720 at 30fps in MP4 format. In addition, the standard Android sensor data TYPE_LINEAR_ACCELERATION, TYPE_ACCELEROMETER, TYPE_LIGHT, TYPE_ROTATION_VECTOR, TYPE_MAGNETIC_FIELD, TYPE_GYROSCOPE, TYPE_GRAVITY on the Glass were recorded at 10fps. All the six sensors, except the Kinect depth sensor, have a build-in microphone, which records the audio during the presentation. The synchronization of the five devices with audio data is done by measuring delay between the audio signals through the calculation of cross-correlations. The Kinect depth sensor is synchronized with the rest by a periodic LED visual signal.

Please ensure the original publications are cited appropriately when reusing this dataset. For more details, please refer to Citation field.

  • T. Gan, Y. Wong, B. Mandal, V. Chandrasekhar, M. Kankanhalli Multi-sensor Self-Quantification of Presentations ACM Multimedia, pp. 601-610, 2015.
  • Junnan Li, Y. Wong, M. Kankanhalli Multi-stream Deep Learning Framework for Automated Presentation Assessment IEEE International Symposium on Multimedia (ISM), 2016.

The dataset is also available at http://mmas.comp.nus.edu.sg/NUSMSP.html.

Related Publications: 10.1145/2733373.2806252
10.1109/ISM.2016.0051
Citation: When using this data, please cite the original publication and also the dataset.
  • T. Gan, Y. Wong, B. Mandal, V. Chandrasekhar, M. Kankanhalli Multi-sensor Self-Quantification of Presentations ACM Multimedia, pp. 601-610, 2015.
  • Junnan Li, Y. Wong, M. Kankanhalli Multi-stream Deep Learning Framework for Automated Presentation Assessment IEEE International Symposium on Multimedia (ISM), 2016.
  • Gan Tian, Wong, Yongkang, Mandal, Bappaditya, Li, Junnan, Chandrasekhar, Vijay, KANKANHALLI,MOHAN S (2017-11-06). NUS Multi-Sensor Presentation (NUSMSP) Dataset. ScholarBank@NUS Repository. [Dataset]. https://doi.org/10.25540/BN6N-4R2M
License: Please refer to the document "Licence.txt".
Appears in Collections:Staff Dataset

Show full item record
Files in This Item:
File Description SizeFormatAccess Settings 
readme.pdf53.62 kBAdobe PDF

OPEN

View/Download
Licence.txt2.36 kBText

OPEN

View/Download
Annotation_2016.tar.gz91.44 kBUnknown

OPEN

View/Download
NUSMSP_01.tar.xz13.73 GBUnknown

OPEN

View/Download
NUSMSP_02.tar.xz14.2 GBUnknown

OPEN

View/Download
NUSMSP_03.tar.xz15.83 GBUnknown

OPEN

View/Download
NUSMSP_04.tar.xz13.78 GBUnknown

OPEN

View/Download
NUSMSP_05.tar.xz13.9 GBUnknown

OPEN

View/Download
NUSMSP_06.tar.xz13.14 GBUnknown

OPEN

View/Download

Page view(s)

449
checked on Aug 16, 2018

Download(s)

218
checked on Aug 16, 2018

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.