Please use this identifier to cite or link to this item:
https://scholarbank.nus.edu.sg/handle/10635/246253
DC Field | Value | |
---|---|---|
dc.title | HIERARCHICAL REINFORCEMENT LEARNING WITH PARAMETERIZED OPTIONS FOR LONG-HORIZON ROBOTIC MANIPULATION | |
dc.contributor.author | GUO CHAOQUN | |
dc.date.accessioned | 2023-11-30T18:00:35Z | |
dc.date.available | 2023-11-30T18:00:35Z | |
dc.date.issued | 2023-08-08 | |
dc.identifier.citation | GUO CHAOQUN (2023-08-08). HIERARCHICAL REINFORCEMENT LEARNING WITH PARAMETERIZED OPTIONS FOR LONG-HORIZON ROBOTIC MANIPULATION. ScholarBank@NUS Repository. | |
dc.identifier.uri | https://scholarbank.nus.edu.sg/handle/10635/246253 | |
dc.description.abstract | Hierarchical Reinforcement Learning (HRL) is a promising approach for addressing long-horizon robotic manipulation tasks with sparse rewards. In the parameterized options framework of HRL, a high-level policy selects a skill and its corresponding low-level goal parameters from a pre-trained skill library, allowing shared skills across tasks. However, fixed skills can lead to poor performance when skills fail to generalize. This work introduces a novel hierarchical algorithm for joint training of two-level policies in the parameterized options framework under sparse reward settings. Three key contributions are made in this thesis. First, a skill library is developed using off-the-shelf RL algorithms for quick learning of simple actions, emphasizing the importance of joint policy training for skill generalization across tasks. Second, the thesis presents a novel hierarchical architecture, Hier-P-DQN, and incorporates high-level active demonstration to ensure stable learning. Lastly, staged sparse rewards and high-level hindsight experience replay (HER) are used to expedite learning. Through extensive experimentation, Hier-P-DQN outperforms baseline methods like DDPG+HER and Behavioral cloning in long-horizon robotic manipulation tasks with sparse rewards. It achieves impressive performance with significantly fewer environment interactions, requiring only 1e4o 1.5e4 episodes, much less than traditional RL methods. Additionally, obtaining high-level demonstrations is easier compared to traditional approaches. | |
dc.language.iso | en | |
dc.subject | reinforcement learning, robotic manipulation, active demonstration, hierarchical reinforcement learning | |
dc.type | Thesis | |
dc.contributor.department | MECHANICAL ENGINEERING | |
dc.contributor.supervisor | Chee Meng Chew | |
dc.description.degree | Master's | |
dc.description.degreeconferred | MASTER OF ENGINEERING (CDE) | |
dc.identifier.orcid | 0009-0006-2515-8113 | |
Appears in Collections: | Master's Theses (Open) |
Show simple item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
GuoChaoqun.pdf | 10.41 MB | Adobe PDF | OPEN | None | View/Download |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.