Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/172670

Title:	Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives
Authors:	CHEUNG WANG CHI
Issue Date:	31-Dec-2019
Citation:	CHEUNG WANG CHI (2019-12-31). Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives. Neural Information Processing Systems (NeurIPS) 32 : 724-734. ScholarBank@NUS Repository.
Abstract:	We consider an agent who is involved in an online Markov decision process, and receives a vector of outcomes every round. The agent aims to simultaneously optimize multiple objectives associated with the multi-dimensional outcomes. Due to state transitions, it is challenging to balance the vectorial outcomes for achieving near-optimality. In particular, contrary to the single objective case, stationary policies are generally sub-optimal. We propose a no-regret algorithm based on the Frank-Wolfe algorithm (Frank and Wolfe 1956), UCRL2 (Jaksch et al. 2010), as well as a crucial and novel gradient threshold procedure. The procedure involves carefully delaying gradient updates, and returns a non-stationary policy that diversifies the outcomes for optimizing the objectives.
Source Title:	Neural Information Processing Systems (NeurIPS)
URI:	https://scholarbank.nus.edu.sg/handle/10635/172670
Appears in Collections:	Staff Publications Elements

File	Description	Size	Format	Access Settings	Version
multi-objective_RL_nips_full.pdf		1.38 MB	Adobe PDF	OPEN	Post-print	View/Download

Check