LEARNING TO MAKE DECISIONS WITH INCOMPLETE INFORMATION: REINFORCEMENT LEARNING, INFORMATION GEOMETRY, AND REAL-LIFE APPLICATIONS | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/150309

DC Field	Value
dc.title	LEARNING TO MAKE DECISIONS WITH INCOMPLETE INFORMATION: REINFORCEMENT LEARNING, INFORMATION GEOMETRY, AND REAL-LIFE APPLICATIONS
dc.contributor.author	DEBABROTA BASU
dc.date.accessioned	2018-12-31T18:00:35Z
dc.date.available	2018-12-31T18:00:35Z
dc.date.issued	2018-08-14
dc.identifier.citation	DEBABROTA BASU (2018-08-14). LEARNING TO MAKE DECISIONS WITH INCOMPLETE INFORMATION: REINFORCEMENT LEARNING, INFORMATION GEOMETRY, AND REAL-LIFE APPLICATIONS. ScholarBank@NUS Repository.
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/150309
dc.description.abstract	We investigate three scenarios of reinforcement learning where the reward function or the underlying process dynamics are not accurately known. In the first scenario, we develop two algorithms, COREIL and rCOREIL, that addresses a known-transition, unknown-reward MDP and apply them to the problem of self-driving database management systems. In the second scenario, we develop an algorithm, Megh, that addresses the known-cost, unknown-transition MDPs and apply Megh for live VM migration in medium-scale data centres. In the third scenario, we develop an information-geometric framework, BelMan, that addresses the unknown-reward, unknown-transition scenario. We analyse BelMan for the pure exploration, exploration-exploitation and two-phase scenarios of multi-armed bandits and apply it to online scheduling in a multiple-queue, multiple-server system with unknown service rates. We show that BelMan theoretically achieves asymptotic convergence while experimentally outperforms the state-of-the-art algorithms for Bernoulli service rates. Finally, we sketch an extension of the information geometric approach to unknown-transition, unknown-reward MDPs that links BelMan, linearly solvable Markov decision processes, and curiosity driven reinforcement learning. This analysis motivates further investigation of the exploration-exploitation trade-off in variants of reinforcement learning.
dc.language.iso	en
dc.subject	Reinforcement Learning, Multi-armed bandits, INFORMATION GEOMETRY, Database tuning, Live VM Migration
dc.type	Thesis
dc.contributor.department	COMPUTER SCIENCE
dc.contributor.supervisor	BRESSAN, STEPHANE
dc.description.degree	Ph.D
dc.description.degreeconferred	DOCTOR OF PHILOSOPHY
dc.identifier.orcid	0000-0002-3204-2884
Appears in Collections:	Ph.D Theses (Open)

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
BasuD.pdf		19.68 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.