Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/236097
Title: Finite-time analysis for double Q-learning
Authors: Xiong, H
Zhao, L 
Liang, Y
Zhang, W
Keywords: cs.LG
cs.LG
math.OC
stat.ML
Issue Date: 12-Dec-2020
Citation: Xiong, H, Zhao, L, Liang, Y, Zhang, W (2020-12-12). Finite-time analysis for double Q-learning. Advances in Neural Information Processing Systems 33 2020-December. ScholarBank@NUS Repository.
Abstract: Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values incurred by random sampling. The double Q-learning algorithm proposed in Hasselt (2010) overcomes such an overestimation issue by randomly switching the update between two Q-estimators, and has thus gained significant popularity in practice. However, the theoretical understanding of double Q-learning is rather limited. So far only the asymptotic convergence has been established, which does not characterize how fast the algorithm converges. In this paper, we provide the first non-asymptotic (i.e., finite-time) analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an -accurate neighborhood of the global optimum by taking Ω˜ 1 (1−γ) 6 2 1 ω + 1 1−γ 1 1−ω iterations, where ω ∈ (0, 1) is the decay parameter of the learning rate, and γ is the discount factor. Our analysis develops novel techniques to derive finite-time bounds on the difference between two inter-connected stochastic processes, which is new to the literature of stochastic approximation.
Source Title: Advances in Neural Information Processing Systems 33
URI: https://scholarbank.nus.edu.sg/handle/10635/236097
ISSN: 1049-5258
Appears in Collections:Staff Publications
Elements

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
NeurIPS-2020-finite-time-analysis-for-double-q-learning-Paper.pdf321.57 kBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.