Please use this identifier to cite or link to this item:
|Title:||SINGLE TRAJECTORY CONVERGENCE IN REINFORCEMENT LEARNING PROBLEMS WITH VECTORIAL REWARDS||Authors:||EWE ZI YI||ORCID iD:||orcid.org/0000-0002-1625-414X||Keywords:||reinforcement learning, convex optimisation, vectorial rewards, concave rewards, Markov decision processes||Issue Date:||20-Aug-2021||Citation:||EWE ZI YI (2021-08-20). SINGLE TRAJECTORY CONVERGENCE IN REINFORCEMENT LEARNING PROBLEMS WITH VECTORIAL REWARDS. ScholarBank@NUS Repository.||Abstract:||Reinforcement learning (RL) problems are well studied with scalar rewards (RL-SR) though this model is insufficient for capturing real-life scenarios with vectorial rewards. Various RL algorithms often scalarise vectorial rewards into scalar rewards using a concave global reward function, transforming them into general RL with concave rewards (RL-CR) problems. These are then solved by maximising the concave scalar rewards using existing RL-SR algorithms. However, most approaches only guarantee convergence to optimality when the reward function is applied across the expectation on multiple trajectories, which fail to guarantee any optimality when the function is applied across any single trajectory. We thus introduce another algorithm that directly maximises the global reward function applied on a single trajectory, and show that this approach guarantees optimality across multiple trajectories. This algorithm, having a general framework that generalises across different concave reward functions, allow various RL-CR problems to be solved in a single unified way.||URI:||https://scholarbank.nus.edu.sg/handle/10635/212697|
|Appears in Collections:||Master's Theses (Open)|
Show full item record
Files in This Item:
|EweZY.pdf||5.85 MB||Adobe PDF|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.