Please use this identifier to cite or link to this item:
|Title:||REPRESENTATION LEARNING OF DATA WITH MULTIPLE MODALITIES WITH APPLICATIONS TO VISUAL QUESTION ANSWERING||Authors:||ILIJA ILIEVSKI||ORCID iD:||orcid.org/http-s://-orci-d.or||Keywords:||deep learning, multimodal data, vqa, neural attention||Issue Date:||24-Aug-2018||Citation:||ILIJA ILIEVSKI (2018-08-24). REPRESENTATION LEARNING OF DATA WITH MULTIPLE MODALITIES WITH APPLICATIONS TO VISUAL QUESTION ANSWERING. ScholarBank@NUS Repository.||Abstract:||Deep learning has started a new era in Artificial Intelligence research with major breakthroughs in multiple fields. Now, as the field strives to achieve Artificial General Intelligence the focus has shifted to tasks involving data of multiple modalities. With this thesis, I address the challenges of representation learning of multimodal data. First, I develop a novel multimodal representation learning and fusion method. The proposed method employs a modular deep neural network where each module learns a representation of a different aspect of the data, achieving a complete and multifaceted representation. The modules’ representations are then fused to a single joint representation via bilinear model that learns the complex interrelationships among the individual representations. Next, we design two types of neural attention mechanisms. The attention mechanisms intelligently adapt the individual representations on each other, given the particular task, achieving superior and focused multimodal representation. Finally, we propose a novel loss function to improve the training convergence and overall performance of complex, modular deep neural models of multimodal data.||URI:||http://scholarbank.nus.edu.sg/handle/10635/150348|
|Appears in Collections:||Ph.D Theses (Open)|
Show full item record
Files in This Item:
|IlievskiI.pdf||9.08 MB||Adobe PDF|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.