Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/150309
DC Field | Value | |
---|---|---|
dc.title | LEARNING TO MAKE DECISIONS WITH INCOMPLETE INFORMATION: REINFORCEMENT LEARNING, INFORMATION GEOMETRY, AND REAL-LIFE APPLICATIONS | |
dc.contributor.author | DEBABROTA BASU | |
dc.date.accessioned | 2018-12-31T18:00:35Z | |
dc.date.available | 2018-12-31T18:00:35Z | |
dc.date.issued | 2018-08-14 | |
dc.identifier.citation | DEBABROTA BASU (2018-08-14). LEARNING TO MAKE DECISIONS WITH INCOMPLETE INFORMATION: REINFORCEMENT LEARNING, INFORMATION GEOMETRY, AND REAL-LIFE APPLICATIONS. ScholarBank@NUS Repository. | |
dc.identifier.uri | http://scholarbank.nus.edu.sg/handle/10635/150309 | |
dc.description.abstract | We investigate three scenarios of reinforcement learning where the reward function or the underlying process dynamics are not accurately known. In the first scenario, we develop two algorithms, COREIL and rCOREIL, that addresses a known-transition, unknown-reward MDP and apply them to the problem of self-driving database management systems. In the second scenario, we develop an algorithm, Megh, that addresses the known-cost, unknown-transition MDPs and apply Megh for live VM migration in medium-scale data centres. In the third scenario, we develop an information-geometric framework, BelMan, that addresses the unknown-reward, unknown-transition scenario. We analyse BelMan for the pure exploration, exploration-exploitation and two-phase scenarios of multi-armed bandits and apply it to online scheduling in a multiple-queue, multiple-server system with unknown service rates. We show that BelMan theoretically achieves asymptotic convergence while experimentally outperforms the state-of-the-art algorithms for Bernoulli service rates. Finally, we sketch an extension of the information geometric approach to unknown-transition, unknown-reward MDPs that links BelMan, linearly solvable Markov decision processes, and curiosity driven reinforcement learning. This analysis motivates further investigation of the exploration-exploitation trade-off in variants of reinforcement learning. | |
dc.language.iso | en | |
dc.subject | Reinforcement Learning, Multi-armed bandits, INFORMATION GEOMETRY, Database tuning, Live VM Migration | |
dc.type | Thesis | |
dc.contributor.department | COMPUTER SCIENCE | |
dc.contributor.supervisor | BRESSAN, STEPHANE | |
dc.description.degree | Ph.D | |
dc.description.degreeconferred | DOCTOR OF PHILOSOPHY | |
dc.identifier.orcid | 0000-0002-3204-2884 | |
Appears in Collections: | Ph.D Theses (Open) |
Show simple item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
BasuD.pdf | 19.68 MB | Adobe PDF | OPEN | None | View/Download |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.