Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/216501
DC FieldValue
dc.titleONLINE RESOURCE ALLOCATION AND ITS APPLICATIONS
dc.contributor.authorZHU QIUYU
dc.date.accessioned2022-02-28T18:00:28Z
dc.date.available2022-02-28T18:00:28Z
dc.date.issued2021-11-18
dc.identifier.citationZHU QIUYU (2021-11-18). ONLINE RESOURCE ALLOCATION AND ITS APPLICATIONS. ScholarBank@NUS Repository.
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/216501
dc.description.abstractOnline resource allocation (ORA) is one of the most important problems in operations research. This thesis focus on algorithm design for different ORA models. In Chapter 2, we study the online resource allocation problem in which the resources are substitutable in two directions. To tackle the complicated substitution effect introduced by the multidimensional substitution, we proposed the Frontier Inventory Balancing (FIB) algorithm motivated by the dynamic programming formulation and a closed-form solution of the linear programming approximation. We provide comprehensive competitive ratio analyses and extensive numerical studies for the proposed algorithms. Simulation studies show that our algorithm outperforms other state-of-art algorithms. In Chapter 3, we study the previous problem further by introducing `learning' into the setting. Under the new setting, the arrival information is not known to the decision-maker. We generalize the FIB algorithm to the new setting and provide some theoretical results. The unknown arrival probability brings extra difficulties to the analyses. Extensive numerical studies are provided to compare the performance of different algorithms. The learning involved in Chapter 3 is limited because the decision-maker’s action does not affect the learning. In this regard, we study another resource allocation model --multi-armed bandit (MAB)-- in Chapter 4 and Chapter 5. MAB is a classical problem that exemplifies the exploration-exploitation trade-off. Standard formulations of MAB do not take into account risk. In online decision-making systems, risk is a primary concern. In this regard, the mean-variance and CVaR risk measures are the most common objective functions. Existing algorithms for risk-aware MAB have unrealistic assumptions on the reward distributions. We develop Thompson Sampling-style algorithms for mean-variance and CVaR MAB, and provide comprehensive regret analyses. Our algorithms achieve the best-known regret bounds for risk-aware MABs and also attain the information-theoretic bounds in some parameter regimes. Empirical simulations show that our algorithms significantly outperform existing LCB-based algorithms.
dc.language.isoen
dc.subjectonline resource allocation, learning, multi-armed bandit, mean-variance, conditional value at risk, algorithm design
dc.typeThesis
dc.contributor.departmentINST OF OPERATIONS RESEARCH & ANALYTICS
dc.contributor.supervisorAndrew Ee Beng Lim
dc.contributor.supervisorTan Yan Fu, Vincent
dc.description.degreePh.D
dc.description.degreeconferredDOCTOR OF PHILOSOPHY (IORA)
dc.identifier.orcid0000-0002-3293-3521
Appears in Collections:Ph.D Theses (Open)

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
ZhuQY.pdf5.69 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.