LEARNING TO MAKE DECISIONS WITH INCOMPLETE INFORMATION: REINFORCEMENT LEARNING, INFORMATION GEOMETRY, AND REAL-LIFE APPLICATIONS | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/150309

Title:	LEARNING TO MAKE DECISIONS WITH INCOMPLETE INFORMATION: REINFORCEMENT LEARNING, INFORMATION GEOMETRY, AND REAL-LIFE APPLICATIONS
Authors:	DEBABROTA BASU
ORCID iD:	orcid.org/0000-0002-3204-2884
Keywords:	Reinforcement Learning, Multi-armed bandits, INFORMATION GEOMETRY, Database tuning, Live VM Migration
Issue Date:	14-Aug-2018
Citation:	DEBABROTA BASU (2018-08-14). LEARNING TO MAKE DECISIONS WITH INCOMPLETE INFORMATION: REINFORCEMENT LEARNING, INFORMATION GEOMETRY, AND REAL-LIFE APPLICATIONS. ScholarBank@NUS Repository.
Abstract:	We investigate three scenarios of reinforcement learning where the reward function or the underlying process dynamics are not accurately known. In the first scenario, we develop two algorithms, COREIL and rCOREIL, that addresses a known-transition, unknown-reward MDP and apply them to the problem of self-driving database management systems. In the second scenario, we develop an algorithm, Megh, that addresses the known-cost, unknown-transition MDPs and apply Megh for live VM migration in medium-scale data centres. In the third scenario, we develop an information-geometric framework, BelMan, that addresses the unknown-reward, unknown-transition scenario. We analyse BelMan for the pure exploration, exploration-exploitation and two-phase scenarios of multi-armed bandits and apply it to online scheduling in a multiple-queue, multiple-server system with unknown service rates. We show that BelMan theoretically achieves asymptotic convergence while experimentally outperforms the state-of-the-art algorithms for Bernoulli service rates. Finally, we sketch an extension of the information geometric approach to unknown-transition, unknown-reward MDPs that links BelMan, linearly solvable Markov decision processes, and curiosity driven reinforcement learning. This analysis motivates further investigation of the exploration-exploitation trade-off in variants of reinforcement learning.
URI:	http://scholarbank.nus.edu.sg/handle/10635/150309
Appears in Collections:	Ph.D Theses (Open)

Show full item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
BasuD.pdf		19.68 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.