Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/150348
Title: REPRESENTATION LEARNING OF DATA WITH MULTIPLE MODALITIES WITH APPLICATIONS TO VISUAL QUESTION ANSWERING
Authors: ILIJA ILIEVSKI
ORCID iD:   orcid.org/http-s://-orci-d.or
Keywords: deep learning, multimodal data, vqa, neural attention
Issue Date: 24-Aug-2018
Citation: ILIJA ILIEVSKI (2018-08-24). REPRESENTATION LEARNING OF DATA WITH MULTIPLE MODALITIES WITH APPLICATIONS TO VISUAL QUESTION ANSWERING. ScholarBank@NUS Repository.
Abstract: Deep learning has started a new era in Artificial Intelligence research with major breakthroughs in multiple fields. Now, as the field strives to achieve Artificial General Intelligence the focus has shifted to tasks involving data of multiple modalities. With this thesis, I address the challenges of representation learning of multimodal data. First, I develop a novel multimodal representation learning and fusion method. The proposed method employs a modular deep neural network where each module learns a representation of a different aspect of the data, achieving a complete and multifaceted representation. The modules’ representations are then fused to a single joint representation via bilinear model that learns the complex interrelationships among the individual representations. Next, we design two types of neural attention mechanisms. The attention mechanisms intelligently adapt the individual representations on each other, given the particular task, achieving superior and focused multimodal representation. Finally, we propose a novel loss function to improve the training convergence and overall performance of complex, modular deep neural models of multimodal data.
URI: http://scholarbank.nus.edu.sg/handle/10635/150348
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
IlievskiI.pdf9.08 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.