HIERARCHICAL REINFORCEMENT LEARNING WITH PARAMETERIZED OPTIONS FOR LONG-HORIZON ROBOTIC MANIPULATION | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/246253

DC Field	Value
dc.title	HIERARCHICAL REINFORCEMENT LEARNING WITH PARAMETERIZED OPTIONS FOR LONG-HORIZON ROBOTIC MANIPULATION
dc.contributor.author	GUO CHAOQUN
dc.date.accessioned	2023-11-30T18:00:35Z
dc.date.available	2023-11-30T18:00:35Z
dc.date.issued	2023-08-08
dc.identifier.citation	GUO CHAOQUN (2023-08-08). HIERARCHICAL REINFORCEMENT LEARNING WITH PARAMETERIZED OPTIONS FOR LONG-HORIZON ROBOTIC MANIPULATION. ScholarBank@NUS Repository.
dc.identifier.uri	https://scholarbank.nus.edu.sg/handle/10635/246253
dc.description.abstract	Hierarchical Reinforcement Learning (HRL) is a promising approach for addressing long-horizon robotic manipulation tasks with sparse rewards. In the parameterized options framework of HRL, a high-level policy selects a skill and its corresponding low-level goal parameters from a pre-trained skill library, allowing shared skills across tasks. However, fixed skills can lead to poor performance when skills fail to generalize. This work introduces a novel hierarchical algorithm for joint training of two-level policies in the parameterized options framework under sparse reward settings. Three key contributions are made in this thesis. First, a skill library is developed using off-the-shelf RL algorithms for quick learning of simple actions, emphasizing the importance of joint policy training for skill generalization across tasks. Second, the thesis presents a novel hierarchical architecture, Hier-P-DQN, and incorporates high-level active demonstration to ensure stable learning. Lastly, staged sparse rewards and high-level hindsight experience replay (HER) are used to expedite learning. Through extensive experimentation, Hier-P-DQN outperforms baseline methods like DDPG+HER and Behavioral cloning in long-horizon robotic manipulation tasks with sparse rewards. It achieves impressive performance with significantly fewer environment interactions, requiring only 1e4o 1.5e4 episodes, much less than traditional RL methods. Additionally, obtaining high-level demonstrations is easier compared to traditional approaches.
dc.language.iso	en
dc.subject	reinforcement learning, robotic manipulation, active demonstration, hierarchical reinforcement learning
dc.type	Thesis
dc.contributor.department	MECHANICAL ENGINEERING
dc.contributor.supervisor	Chee Meng Chew
dc.description.degree	Master's
dc.description.degreeconferred	MASTER OF ENGINEERING (CDE)
dc.identifier.orcid	0009-0006-2515-8113
Appears in Collections:	Master's Theses (Open)

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
GuoChaoqun.pdf		10.41 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.