Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/212697
Title: SINGLE TRAJECTORY CONVERGENCE IN REINFORCEMENT LEARNING PROBLEMS WITH VECTORIAL REWARDS
Authors: EWE ZI YI
ORCID iD:   orcid.org/0000-0002-1625-414X
Keywords: reinforcement learning, convex optimisation, vectorial rewards, concave rewards, Markov decision processes
Issue Date: 20-Aug-2021
Citation: EWE ZI YI (2021-08-20). SINGLE TRAJECTORY CONVERGENCE IN REINFORCEMENT LEARNING PROBLEMS WITH VECTORIAL REWARDS. ScholarBank@NUS Repository.
Abstract: Reinforcement learning (RL) problems are well studied with scalar rewards (RL-SR) though this model is insufficient for capturing real-life scenarios with vectorial rewards. Various RL algorithms often scalarise vectorial rewards into scalar rewards using a concave global reward function, transforming them into general RL with concave rewards (RL-CR) problems. These are then solved by maximising the concave scalar rewards using existing RL-SR algorithms. However, most approaches only guarantee convergence to optimality when the reward function is applied across the expectation on multiple trajectories, which fail to guarantee any optimality when the function is applied across any single trajectory. We thus introduce another algorithm that directly maximises the global reward function applied on a single trajectory, and show that this approach guarantees optimality across multiple trajectories. This algorithm, having a general framework that generalises across different concave reward functions, allow various RL-CR problems to be solved in a single unified way.
URI: https://scholarbank.nus.edu.sg/handle/10635/212697
Appears in Collections:Master's Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
EweZY.pdf5.85 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.