ONLINE RESOURCE ALLOCATION AND ITS APPLICATIONS | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/216501

DC Field	Value
dc.title	ONLINE RESOURCE ALLOCATION AND ITS APPLICATIONS
dc.contributor.author	ZHU QIUYU
dc.date.accessioned	2022-02-28T18:00:28Z
dc.date.available	2022-02-28T18:00:28Z
dc.date.issued	2021-11-18
dc.identifier.citation	ZHU QIUYU (2021-11-18). ONLINE RESOURCE ALLOCATION AND ITS APPLICATIONS. ScholarBank@NUS Repository.
dc.identifier.uri	https://scholarbank.nus.edu.sg/handle/10635/216501
dc.description.abstract	Online resource allocation (ORA) is one of the most important problems in operations research. This thesis focus on algorithm design for different ORA models. In Chapter 2, we study the online resource allocation problem in which the resources are substitutable in two directions. To tackle the complicated substitution effect introduced by the multidimensional substitution, we proposed the Frontier Inventory Balancing (FIB) algorithm motivated by the dynamic programming formulation and a closed-form solution of the linear programming approximation. We provide comprehensive competitive ratio analyses and extensive numerical studies for the proposed algorithms. Simulation studies show that our algorithm outperforms other state-of-art algorithms. In Chapter 3, we study the previous problem further by introducing `learning' into the setting. Under the new setting, the arrival information is not known to the decision-maker. We generalize the FIB algorithm to the new setting and provide some theoretical results. The unknown arrival probability brings extra difficulties to the analyses. Extensive numerical studies are provided to compare the performance of different algorithms. The learning involved in Chapter 3 is limited because the decision-maker’s action does not affect the learning. In this regard, we study another resource allocation model --multi-armed bandit (MAB)-- in Chapter 4 and Chapter 5. MAB is a classical problem that exemplifies the exploration-exploitation trade-off. Standard formulations of MAB do not take into account risk. In online decision-making systems, risk is a primary concern. In this regard, the mean-variance and CVaR risk measures are the most common objective functions. Existing algorithms for risk-aware MAB have unrealistic assumptions on the reward distributions. We develop Thompson Sampling-style algorithms for mean-variance and CVaR MAB, and provide comprehensive regret analyses. Our algorithms achieve the best-known regret bounds for risk-aware MABs and also attain the information-theoretic bounds in some parameter regimes. Empirical simulations show that our algorithms significantly outperform existing LCB-based algorithms.
dc.language.iso	en
dc.subject	online resource allocation, learning, multi-armed bandit, mean-variance, conditional value at risk, algorithm design
dc.type	Thesis
dc.contributor.department	INST OF OPERATIONS RESEARCH & ANALYTICS
dc.contributor.supervisor	Andrew Ee Beng Lim
dc.contributor.supervisor	Tan Yan Fu, Vincent
dc.description.degree	Ph.D
dc.description.degreeconferred	DOCTOR OF PHILOSOPHY (IORA)
dc.identifier.orcid	0000-0002-3293-3521
Appears in Collections:	Ph.D Theses (Open)

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
ZhuQY.pdf		5.69 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.