Hessian matrix distribution for Bayesian policy gradient reinforcement learning

Please use this identifier to cite or link to this item: https://doi.org/10.1016/j.ins.2011.01.001

DC Field	Value
dc.title	Hessian matrix distribution for Bayesian policy gradient reinforcement learning
dc.contributor.author	Vien, N.A.
dc.contributor.author	Yu, H.
dc.contributor.author	Chung, T.
dc.date.accessioned	2013-07-04T07:28:46Z
dc.date.available	2013-07-04T07:28:46Z
dc.date.issued	2011
dc.identifier.citation	Vien, N.A., Yu, H., Chung, T. (2011). Hessian matrix distribution for Bayesian policy gradient reinforcement learning. Information Sciences 181 (9) : 1671-1685. ScholarBank@NUS Repository. https://doi.org/10.1016/j.ins.2011.01.001
dc.identifier.issn	00200255
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/38871
dc.description.abstract	Bayesian policy gradient algorithms have been recently proposed for modeling the policy gradient of the performance measure in reinforcement learning as a Gaussian process. These methods were known to reduce the variance and the number of samples needed to obtain accurate gradient estimates in comparison to the conventional Monte-Carlo policy gradient algorithms. In this paper, we propose an improvement over previous Bayesian frameworks for the policy gradient. We use the Hessian matrix distribution as a learning rate schedule to improve the performance of the Bayesian policy gradient algorithm in terms of the variance and the number of samples. As in computing the policy gradient distributions, the Bayesian quadrature method is used to estimate the Hessian matrix distributions. We prove that the posterior mean of the Hessian distribution estimate is symmetric, one of the important properties of the Hessian matrix. Moreover, we prove that with an appropriate choice of kernel, the computational complexity of Hessian distribution estimate is equal to that of the policy gradient distribution estimates. Using simulations, we show encouraging experimental results comparing the proposed algorithm to the Bayesian policy gradient and the Bayesian policy natural gradient algorithms described in Ghavamzadeh and Engel [10]. © 2011 Elsevier Inc. All rights reserved.
dc.description.uri	http://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1016/j.ins.2011.01.001
dc.source	Scopus
dc.subject	Bayesian policy gradient
dc.subject	Hessian matrix distribution
dc.subject	Markov decision process
dc.subject	Monte-Carlo policy gradient
dc.subject	Policy gradient
dc.subject	Reinforcement learning
dc.type	Article
dc.contributor.department	COMPUTER SCIENCE
dc.description.doi	10.1016/j.ins.2011.01.001
dc.description.sourcetitle	Information Sciences
dc.description.volume	181
dc.description.issue	9
dc.description.page	1671-1685
dc.description.coden	ISIJB
dc.identifier.isiut	000288774700011
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM