Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/204295
DC FieldValue
dc.titleState-Aware Variational Thompson Sampling for Deep Q-Networks
dc.contributor.authorWEE SUN LEE
dc.contributor.authorSiddharth Aravindan
dc.date.accessioned2021-10-27T00:54:29Z
dc.date.available2021-10-27T00:54:29Z
dc.date.issued2021-05-03
dc.identifier.citationWEE SUN LEE, Siddharth Aravindan (2021-05-03). State-Aware Variational Thompson Sampling for Deep Q-Networks. AAMAS '21: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems : 124-132. ScholarBank@NUS Repository.
dc.identifier.isbnarXiv:2102.03719v1
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/204295
dc.description.abstractThompson sampling is a well-known approach for balancing exploration and exploitation in reinforcement learning. It requires the posterior distribution of value-action functions to be maintained; this is generally intractable for tasks that have a high dimensional state-action space. We derive a variational Thompson sampling approximation for DQNs which uses a deep network whose parameters are perturbed by a learned variational noise distribution. We interpret the successful NoisyNets method [10] as an approximation to the variational Thompson sampling method that we derive. Further, we propose State Aware Noisy Exploration (SANE) which seeks to improve on NoisyNets by allowing a non-uniform perturbation, where the amount of parameter perturbation is conditioned on the state of the agent. This is done with the help of an auxiliary perturbation module, whose output is state dependent and is learnt end to end with gradient descent. We hypothesize that such state-aware noisy exploration is particularly useful in problems where exploration in certain high risk states may result in the agent failing badly. We demonstrate the effectiveness of the state-aware exploration method in the off-policy setting by augmenting DQNs with the auxiliary perturbation module.
dc.language.isoen
dc.publisherAAMAS
dc.subjectDeep Reinforcement Learning; Thompson Sampling; Exploration
dc.typeConference Paper
dc.contributor.departmentDEAN'S OFFICE (SCHOOL OF COMPUTING)
dc.contributor.departmentDEPARTMENT OF COMPUTER SCIENCE
dc.description.sourcetitleAAMAS '21: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems
dc.description.page124-132
dc.published.statePublished
Appears in Collections:Elements
Students Publications
Staff Publications

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
AAMAS.pdf4.31 MBAdobe PDF

OPEN

Post-printView/Download

Page view(s)

75
checked on Jan 13, 2022

Download(s)

2
checked on Jan 13, 2022

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.