IMPROVING ATTENTION-BASED DEEP LEARNING MODELS WITH LOCALITY

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/241491

Title:	IMPROVING ATTENTION-BASED DEEP LEARNING MODELS WITH LOCALITY
Authors:	JIANG ZIHANG
ORCID iD:	orcid.org/0000-0002-8096-842X
Keywords:	network architecture,deep learning,attention,local,transformer,machine learning
Issue Date:	19-Dec-2022
Citation:	JIANG ZIHANG (2022-12-19). IMPROVING ATTENTION-BASED DEEP LEARNING MODELS WITH LOCALITY. ScholarBank@NUS Repository.
Abstract:	Recent attention-based deep learning models adopt a transformer architecture that uses attention mechanism to figure out which part of the input the model should be concerned. Notably, these models outperform traditional CNN and RNN-based methods significantly. However, due to the permutation in-variance propriety, the attention-based models suffer from the data-inefficiency problem during training. And these models also suffer from the redundancy lies in the attention mechanism, so the computation cost is generally very high. Thus, we propose to introduce locality to improve these attention-based models. Specifically, we propose a novel token labeling training objective to enhance local supervision to enhance training efficiency. To address the issue of redundancy in the attention mechanism, we propose to design novel and naturally local operators. We introduce two carefully designed module: outlook attention and span-based dynamic convolution to improve the efficiency and enhance the model performance.
URI:	https://scholarbank.nus.edu.sg/handle/10635/241491
Appears in Collections:	Ph.D Theses (Open)

File	Description	Size	Format	Access Settings	Version
JiangZH.pdf		3.16 MB	Adobe PDF	OPEN	None	View/Download

Check