Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/216501
DC Field | Value | |
---|---|---|
dc.title | ONLINE RESOURCE ALLOCATION AND ITS APPLICATIONS | |
dc.contributor.author | ZHU QIUYU | |
dc.date.accessioned | 2022-02-28T18:00:28Z | |
dc.date.available | 2022-02-28T18:00:28Z | |
dc.date.issued | 2021-11-18 | |
dc.identifier.citation | ZHU QIUYU (2021-11-18). ONLINE RESOURCE ALLOCATION AND ITS APPLICATIONS. ScholarBank@NUS Repository. | |
dc.identifier.uri | https://scholarbank.nus.edu.sg/handle/10635/216501 | |
dc.description.abstract | Online resource allocation (ORA) is one of the most important problems in operations research. This thesis focus on algorithm design for different ORA models. In Chapter 2, we study the online resource allocation problem in which the resources are substitutable in two directions. To tackle the complicated substitution effect introduced by the multidimensional substitution, we proposed the Frontier Inventory Balancing (FIB) algorithm motivated by the dynamic programming formulation and a closed-form solution of the linear programming approximation. We provide comprehensive competitive ratio analyses and extensive numerical studies for the proposed algorithms. Simulation studies show that our algorithm outperforms other state-of-art algorithms. In Chapter 3, we study the previous problem further by introducing `learning' into the setting. Under the new setting, the arrival information is not known to the decision-maker. We generalize the FIB algorithm to the new setting and provide some theoretical results. The unknown arrival probability brings extra difficulties to the analyses. Extensive numerical studies are provided to compare the performance of different algorithms. The learning involved in Chapter 3 is limited because the decision-maker’s action does not affect the learning. In this regard, we study another resource allocation model --multi-armed bandit (MAB)-- in Chapter 4 and Chapter 5. MAB is a classical problem that exemplifies the exploration-exploitation trade-off. Standard formulations of MAB do not take into account risk. In online decision-making systems, risk is a primary concern. In this regard, the mean-variance and CVaR risk measures are the most common objective functions. Existing algorithms for risk-aware MAB have unrealistic assumptions on the reward distributions. We develop Thompson Sampling-style algorithms for mean-variance and CVaR MAB, and provide comprehensive regret analyses. Our algorithms achieve the best-known regret bounds for risk-aware MABs and also attain the information-theoretic bounds in some parameter regimes. Empirical simulations show that our algorithms significantly outperform existing LCB-based algorithms. | |
dc.language.iso | en | |
dc.subject | online resource allocation, learning, multi-armed bandit, mean-variance, conditional value at risk, algorithm design | |
dc.type | Thesis | |
dc.contributor.department | INST OF OPERATIONS RESEARCH & ANALYTICS | |
dc.contributor.supervisor | Andrew Ee Beng Lim | |
dc.contributor.supervisor | Tan Yan Fu, Vincent | |
dc.description.degree | Ph.D | |
dc.description.degreeconferred | DOCTOR OF PHILOSOPHY (IORA) | |
dc.identifier.orcid | 0000-0002-3293-3521 | |
Appears in Collections: | Ph.D Theses (Open) |
Show simple item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
ZhuQY.pdf | 5.69 MB | Adobe PDF | OPEN | None | View/Download |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.