Finite-time analysis for double Q-learning | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/236097

Title:	Finite-time analysis for double Q-learning
Authors:	Xiong, H Zhao, L Liang, Y Zhang, W
Keywords:	cs.LG cs.LG math.OC stat.ML
Issue Date:	12-Dec-2020
Citation:	Xiong, H, Zhao, L, Liang, Y, Zhang, W (2020-12-12). Finite-time analysis for double Q-learning. Advances in Neural Information Processing Systems 33 2020-December. ScholarBank@NUS Repository.
Abstract:	Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values incurred by random sampling. The double Q-learning algorithm proposed in Hasselt (2010) overcomes such an overestimation issue by randomly switching the update between two Q-estimators, and has thus gained significant popularity in practice. However, the theoretical understanding of double Q-learning is rather limited. So far only the asymptotic convergence has been established, which does not characterize how fast the algorithm converges. In this paper, we provide the first non-asymptotic (i.e., finite-time) analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an -accurate neighborhood of the global optimum by taking Ω˜ 1 (1−γ) 6 2 1 ω + 1 1−γ 1 1−ω iterations, where ω ∈ (0, 1) is the decay parameter of the learning rate, and γ is the discount factor. Our analysis develops novel techniques to derive finite-time bounds on the difference between two inter-connected stochastic processes, which is new to the literature of stochastic approximation.
Source Title:	Advances in Neural Information Processing Systems 33
URI:	https://scholarbank.nus.edu.sg/handle/10635/236097
ISSN:	1049-5258
Appears in Collections:	Staff Publications Elements

Show full item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
NeurIPS-2020-finite-time-analysis-for-double-q-learning-Paper.pdf		321.57 kB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.