NUS Multi-Sensor Presentation (NUSMSP) Dataset | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://doi.org/10.25540/BN6N-4R2M

Title:	NUS Multi-Sensor Presentation (NUSMSP) Dataset
Creators:	Gan Tian Wong Yong Kang Mandal, Bappaditya Li Junnan Chandrasekhar, Vijay Kankanhalli, Mohan S
NUS Contact:	Yong Kang Wong
External Contact:	Gan Tian Li Junnan
Subject:	Quantified Self Multi-modal Analysis Presentations Sensors Egocentric Vision Skeleton Feature extraction Machine learning Speech recognition Mel frequency cepstral coefficient Robustness
DOI:	doi:10.25540/BN6N-4R2M
Description:	Oral presentation has been an effective method for delivering information to a group of participants for many years. In the past couple of decades, technological advancements have revolutionized the way humans deliver presentations. Unfortunately, due to a variety of reasons, the quality of presentations can be variable which can have an impact on its efficacy. Assessing the quality of a presentation usually requires painstaking manual analysis by experts. The expert feedback can definitely help people improve their presentation skills. Unfortunately, the manual evaluation of the presentation quality by experts is not cost effective and may not available to most people. In this work, we collected a novel NUS Multi-Sensor Presentation (NUSMSP) Dataset, which contains 51 real-world presentations recorded in a multi-sensor environment. The NUSMSP Dataset was recorded between December 2014 and February 2015 at the National University of Singapore (NUS). The dataset is collected in a meeting room equipped with two static cameras (with built-in microphone), one Kinect depth sensor, and three Google Glasses. This dataset consists of 51 unique individuals (32 males and 19 females). Each subject was asked to prepare and deliver a 10 to 15 minutes presentation with no restriction on the topic. For each recording (presentation), the number of audience members ranged from 4 to 8. In total, we have about 10 hours of valid presentation data. Due to the unpredictable recording conditions, a small portion of data from the sensors failed to record the presentation. For each presentation, the ambient Kinect depth sensor (denoted as AM-K) captured the speaker's behavior with RGBD data. A high resolution video recording the audiences' behavior was captured using an ambient static camera (denoted as AM-S 2) with a resolution of 1920x1080 at 30fps in MP4 format. Meanwhile, another ambient static camera (denoted as AM-S 2) captured the overview of both the speaker and audiences' behavior with the same specification. The speaker and two randomly chosen audience members were asked to wear a Google Glass. The Google Glass records the video with a resolution of 1280x720 at 30fps in MP4 format. In addition, the standard Android sensor data TYPE_LINEAR_ACCELERATION, TYPE_ACCELEROMETER, TYPE_LIGHT, TYPE_ROTATION_VECTOR, TYPE_MAGNETIC_FIELD, TYPE_GYROSCOPE, TYPE_GRAVITY on the Glass were recorded at 10fps. All the six sensors, except the Kinect depth sensor, have a build-in microphone, which records the audio during the presentation. The synchronization of the five devices with audio data is done by measuring delay between the audio signals through the calculation of cross-correlations. The Kinect depth sensor is synchronized with the rest by a periodic LED visual signal. Please ensure the original publications are cited appropriately when reusing this dataset. For more details, please refer to Citation field. T. Gan, Y. Wong, B. Mandal, V. Chandrasekhar, M. Kankanhalli Multi-sensor Self-Quantification of Presentations ACM Multimedia, pp. 601-610, 2015. Junnan Li, Y. Wong, M. Kankanhalli Multi-stream Deep Learning Framework for Automated Presentation Assessment IEEE International Symposium on Multimedia (ISM), 2016. The dataset is also available at http://mmas.comp.nus.edu.sg/NUSMSP.html.
Related Publications:	10.1145/2733373.2806252 10.1109/ISM.2016.0051
Citation:	When using this data, please cite the original publication and also the dataset. T. Gan, Y. Wong, B. Mandal, V. Chandrasekhar, M. Kankanhalli Multi-sensor Self-Quantification of Presentations ACM Multimedia, pp. 601-610, 2015. Junnan Li, Y. Wong, M. Kankanhalli Multi-stream Deep Learning Framework for Automated Presentation Assessment IEEE International Symposium on Multimedia (ISM), 2016. Gan Tian, Wong Yong Kang, Mandal, Bappaditya, Li Junnan, Chandrasekhar, Vijay, Kankanhalli, Mohan S (2017-11-06). NUS Multi-Sensor Presentation (NUSMSP) Dataset. ScholarBank@NUS Repository. [Dataset]. https://doi.org/10.25540/BN6N-4R2M
License:	Please refer to the document "Licence.txt".
Appears in Collections:	Staff Dataset

Show full item record

Files in This Item:

File	Description	Size	Format	Access Settings
readme.pdf		53.62 kB	Adobe PDF	OPEN	View/Download
Licence.txt		2.36 kB	Text	OPEN	View/Download
Annotation_2016.tar.gz		91.44 kB	Unknown	OPEN	View/Download
NUSMSP_01.tar.xz		13.73 GB	Unknown	OPEN	View/Download
NUSMSP_02.tar.xz		14.2 GB	Unknown	OPEN	View/Download
NUSMSP_03.tar.xz		15.83 GB	Unknown	OPEN	View/Download
NUSMSP_04.tar.xz		13.78 GB	Unknown	OPEN	View/Download
NUSMSP_05.tar.xz		13.9 GB	Unknown	OPEN	View/Download
NUSMSP_06.tar.xz		13.14 GB	Unknown	OPEN	View/Download

Google Scholar^TM

Check

Altmetric

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.