|Title:||NUS Multi-Sensor Presentation (NUSMSP) Dataset||Creators:||Gan Tian
Wong Yong Kang
Kankanhalli, Mohan S
|NUS Contact:||Wong Yong Kang||External Contact:||Gan Tian
Mel frequency cepstral coefficient
Oral presentation has been an effective method for delivering information to a group of participants for many years. In the past couple of decades, technological advancements have revolutionized the way humans deliver presentations. Unfortunately, due to a variety of reasons, the quality of presentations can be variable which can have an impact on its efficacy. Assessing the quality of a presentation usually requires painstaking manual analysis by experts. The expert feedback can definitely help people improve their presentation skills. Unfortunately, the manual evaluation of the presentation quality by experts is not cost effective and may not available to most people.
In this work, we collected a novel NUS Multi-Sensor Presentation (NUSMSP) Dataset, which contains 51 real-world presentations recorded in a multi-sensor environment. The NUSMSP Dataset was recorded between December 2014 and February 2015 at the National University of Singapore (NUS). The dataset is collected in a meeting room equipped with two static cameras (with built-in microphone), one Kinect depth sensor, and three Google Glasses. This dataset consists of 51 unique individuals (32 males and 19 females). Each subject was asked to prepare and deliver a 10 to 15 minutes presentation with no restriction on the topic. For each recording (presentation), the number of audience members ranged from 4 to 8. In total, we have about 10 hours of valid presentation data. Due to the unpredictable recording conditions, a small portion of data from the sensors failed to record the presentation.
For each presentation, the ambient Kinect depth sensor (denoted as AM-K) captured the speaker's behavior with RGBD data. A high resolution video recording the audiences' behavior was captured using an ambient static camera (denoted as AM-S 2) with a resolution of 1920x1080 at 30fps in MP4 format. Meanwhile, another ambient static camera (denoted as AM-S 2) captured the overview of both the speaker and audiences' behavior with the same specification. The speaker and two randomly chosen audience members were asked to wear a Google Glass. The Google Glass records the video with a resolution of 1280x720 at 30fps in MP4 format. In addition, the standard Android sensor data TYPE_LINEAR_ACCELERATION, TYPE_ACCELEROMETER, TYPE_LIGHT, TYPE_ROTATION_VECTOR, TYPE_MAGNETIC_FIELD, TYPE_GYROSCOPE, TYPE_GRAVITY on the Glass were recorded at 10fps. All the six sensors, except the Kinect depth sensor, have a build-in microphone, which records the audio during the presentation. The synchronization of the five devices with audio data is done by measuring delay between the audio signals through the calculation of cross-correlations. The Kinect depth sensor is synchronized with the rest by a periodic LED visual signal.
Please ensure the original publications are cited appropriately when reusing this dataset. For more details, please refer to Citation field.
The dataset is also available at http://mmas.comp.nus.edu.sg/NUSMSP.html.
|Citation:||When using this data, please cite the original publication and also the dataset.
||License:||Please refer to the document "Licence.txt".|
|Appears in Collections:||Staff Dataset|
Show full item record
Files in This Item:
|readme.pdf||53.62 kB||Adobe PDF|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.