Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/241491
Title: IMPROVING ATTENTION-BASED DEEP LEARNING MODELS WITH LOCALITY
Authors: JIANG ZIHANG
ORCID iD:   orcid.org/0000-0002-8096-842X
Keywords: network architecture,deep learning,attention,local,transformer,machine learning
Issue Date: 19-Dec-2022
Citation: JIANG ZIHANG (2022-12-19). IMPROVING ATTENTION-BASED DEEP LEARNING MODELS WITH LOCALITY. ScholarBank@NUS Repository.
Abstract: Recent attention-based deep learning models adopt a transformer architecture that uses attention mechanism to figure out which part of the input the model should be concerned. Notably, these models outperform traditional CNN and RNN-based methods significantly. However, due to the permutation in-variance propriety, the attention-based models suffer from the data-inefficiency problem during training. And these models also suffer from the redundancy lies in the attention mechanism, so the computation cost is generally very high. Thus, we propose to introduce locality to improve these attention-based models. Specifically, we propose a novel token labeling training objective to enhance local supervision to enhance training efficiency. To address the issue of redundancy in the attention mechanism, we propose to design novel and naturally local operators. We introduce two carefully designed module: outlook attention and span-based dynamic convolution to improve the efficiency and enhance the model performance.
URI: https://scholarbank.nus.edu.sg/handle/10635/241491
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
JiangZH.pdf3.16 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.