Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/150309
DC FieldValue
dc.titleLEARNING TO MAKE DECISIONS WITH INCOMPLETE INFORMATION: REINFORCEMENT LEARNING, INFORMATION GEOMETRY, AND REAL-LIFE APPLICATIONS
dc.contributor.authorDEBABROTA BASU
dc.date.accessioned2018-12-31T18:00:35Z
dc.date.available2018-12-31T18:00:35Z
dc.date.issued2018-08-14
dc.identifier.citationDEBABROTA BASU (2018-08-14). LEARNING TO MAKE DECISIONS WITH INCOMPLETE INFORMATION: REINFORCEMENT LEARNING, INFORMATION GEOMETRY, AND REAL-LIFE APPLICATIONS. ScholarBank@NUS Repository.
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/150309
dc.description.abstractWe investigate three scenarios of reinforcement learning where the reward function or the underlying process dynamics are not accurately known. In the first scenario, we develop two algorithms, COREIL and rCOREIL, that addresses a known-transition, unknown-reward MDP and apply them to the problem of self-driving database management systems. In the second scenario, we develop an algorithm, Megh, that addresses the known-cost, unknown-transition MDPs and apply Megh for live VM migration in medium-scale data centres. In the third scenario, we develop an information-geometric framework, BelMan, that addresses the unknown-reward, unknown-transition scenario. We analyse BelMan for the pure exploration, exploration-exploitation and two-phase scenarios of multi-armed bandits and apply it to online scheduling in a multiple-queue, multiple-server system with unknown service rates. We show that BelMan theoretically achieves asymptotic convergence while experimentally outperforms the state-of-the-art algorithms for Bernoulli service rates. Finally, we sketch an extension of the information geometric approach to unknown-transition, unknown-reward MDPs that links BelMan, linearly solvable Markov decision processes, and curiosity driven reinforcement learning. This analysis motivates further investigation of the exploration-exploitation trade-off in variants of reinforcement learning.
dc.language.isoen
dc.subjectReinforcement Learning, Multi-armed bandits, INFORMATION GEOMETRY, Database tuning, Live VM Migration
dc.typeThesis
dc.contributor.departmentCOMPUTER SCIENCE
dc.contributor.supervisorBRESSAN, STEPHANE
dc.description.degreePh.D
dc.description.degreeconferredDOCTOR OF PHILOSOPHY
dc.identifier.orcid0000-0002-3204-2884
Appears in Collections:Ph.D Theses (Open)

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
BasuD.pdf19.68 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.