Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/172670
Title: | Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives | Authors: | CHEUNG WANG CHI | Issue Date: | 31-Dec-2019 | Citation: | CHEUNG WANG CHI (2019-12-31). Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives. Neural Information Processing Systems (NeurIPS) 32 : 724-734. ScholarBank@NUS Repository. | Abstract: | We consider an agent who is involved in an online Markov decision process, and receives a vector of outcomes every round. The agent aims to simultaneously optimize multiple objectives associated with the multi-dimensional outcomes. Due to state transitions, it is challenging to balance the vectorial outcomes for achieving near-optimality. In particular, contrary to the single objective case, stationary policies are generally sub-optimal. We propose a no-regret algorithm based on the Frank-Wolfe algorithm (Frank and Wolfe 1956), UCRL2 (Jaksch et al. 2010), as well as a crucial and novel gradient threshold procedure. The procedure involves carefully delaying gradient updates, and returns a non-stationary policy that diversifies the outcomes for optimizing the objectives. | Source Title: | Neural Information Processing Systems (NeurIPS) | URI: | https://scholarbank.nus.edu.sg/handle/10635/172670 |
Appears in Collections: | Staff Publications Elements |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
multi-objective_RL_nips_full.pdf | 1.38 MB | Adobe PDF | OPEN | Post-print | View/Download |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.