Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/236097
Title: | Finite-time analysis for double Q-learning | Authors: | Xiong, H Zhao, L Liang, Y Zhang, W |
Keywords: | cs.LG cs.LG math.OC stat.ML |
Issue Date: | 12-Dec-2020 | Citation: | Xiong, H, Zhao, L, Liang, Y, Zhang, W (2020-12-12). Finite-time analysis for double Q-learning. Advances in Neural Information Processing Systems 33 2020-December. ScholarBank@NUS Repository. | Abstract: | Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values incurred by random sampling. The double Q-learning algorithm proposed in Hasselt (2010) overcomes such an overestimation issue by randomly switching the update between two Q-estimators, and has thus gained significant popularity in practice. However, the theoretical understanding of double Q-learning is rather limited. So far only the asymptotic convergence has been established, which does not characterize how fast the algorithm converges. In this paper, we provide the first non-asymptotic (i.e., finite-time) analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an -accurate neighborhood of the global optimum by taking Ω˜ 1 (1−γ) 6 2 1 ω + 1 1−γ 1 1−ω iterations, where ω ∈ (0, 1) is the decay parameter of the learning rate, and γ is the discount factor. Our analysis develops novel techniques to derive finite-time bounds on the difference between two inter-connected stochastic processes, which is new to the literature of stochastic approximation. | Source Title: | Advances in Neural Information Processing Systems 33 | URI: | https://scholarbank.nus.edu.sg/handle/10635/236097 | ISSN: | 1049-5258 |
Appears in Collections: | Staff Publications Elements |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
NeurIPS-2020-finite-time-analysis-for-double-q-learning-Paper.pdf | 321.57 kB | Adobe PDF | OPEN | None | View/Download |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.