HIERARCHICAL REINFORCEMENT LEARNING WITH PARAMETERIZED OPTIONS FOR LONG-HORIZON ROBOTIC MANIPULATION | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/246253

Title:	HIERARCHICAL REINFORCEMENT LEARNING WITH PARAMETERIZED OPTIONS FOR LONG-HORIZON ROBOTIC MANIPULATION
Authors:	GUO CHAOQUN
ORCID iD:	orcid.org/0009-0006-2515-8113
Keywords:	reinforcement learning, robotic manipulation, active demonstration, hierarchical reinforcement learning
Issue Date:	8-Aug-2023
Citation:	GUO CHAOQUN (2023-08-08). HIERARCHICAL REINFORCEMENT LEARNING WITH PARAMETERIZED OPTIONS FOR LONG-HORIZON ROBOTIC MANIPULATION. ScholarBank@NUS Repository.
Abstract:	Hierarchical Reinforcement Learning (HRL) is a promising approach for addressing long-horizon robotic manipulation tasks with sparse rewards. In the parameterized options framework of HRL, a high-level policy selects a skill and its corresponding low-level goal parameters from a pre-trained skill library, allowing shared skills across tasks. However, fixed skills can lead to poor performance when skills fail to generalize. This work introduces a novel hierarchical algorithm for joint training of two-level policies in the parameterized options framework under sparse reward settings. Three key contributions are made in this thesis. First, a skill library is developed using off-the-shelf RL algorithms for quick learning of simple actions, emphasizing the importance of joint policy training for skill generalization across tasks. Second, the thesis presents a novel hierarchical architecture, Hier-P-DQN, and incorporates high-level active demonstration to ensure stable learning. Lastly, staged sparse rewards and high-level hindsight experience replay (HER) are used to expedite learning. Through extensive experimentation, Hier-P-DQN outperforms baseline methods like DDPG+HER and Behavioral cloning in long-horizon robotic manipulation tasks with sparse rewards. It achieves impressive performance with significantly fewer environment interactions, requiring only 1e4o 1.5e4 episodes, much less than traditional RL methods. Additionally, obtaining high-level demonstrations is easier compared to traditional approaches.
URI:	https://scholarbank.nus.edu.sg/handle/10635/246253
Appears in Collections:	Master's Theses (Open)

Show full item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
GuoChaoqun.pdf		10.41 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.