Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/214490
Title: MODEL-BASED REINFORCEMENT LEARNING FOR COMPLEX ENVIRONMENTS
Authors: MA XIAO
ORCID iD:   orcid.org/0000-0001-5466-867X
Keywords: Reinforcement Learning, Representation Learning, Robotics
Issue Date: 30-Jun-2021
Citation: MA XIAO (2021-06-30). MODEL-BASED REINFORCEMENT LEARNING FOR COMPLEX ENVIRONMENTS. ScholarBank@NUS Repository.
Abstract: Deep reinforcement learning (DRL) has achieved great success in sophisticated games, such as Atari, Go, etc. Compared with classic planning methods that explicitly reason handcrafted and potentially inaccurate dynamic models, standard DRL methods directly map raw observations to a policy, which reduces accumulative model-predictive error and maximizes the overall performance of the system. However, generalizing existing DRL methods to real-robot setups remains challenging. Different from a game environment where the observations are well-defined and relatively simple, most of the real-world decision making tasks require reasoning a low-dimensional state embedded in high-dimensional complex partial observations. In this thesis, we focus on model-based reinforcement learning. We demonstrate that by the discriminative learning of structured dynamic models, the overall performance of DRL algorithms under complex environments can be significantly improved. We begin with the task of manipulating objects with complex dynamics, in particular, the deformable objects. Deformable object manipulation is challenging due to their infinite degree-of-freedom (Dof) and highly non-linear dynamics. We present latent Graph dynamics for DefOrmable Object Manipulation (G-DOOM). G- DOOM explicitly imposes a structured graph state representation that approximates a deformable object as a sparse set of interacting keypoints and learns a graph neural network that captures abstractly the geometry and interaction dynamics of the keypoints. Such a parameterization tackles the high DoF problem and complex non-linear dynamics issue of a deformable object. We train a recurrent graph dynamics, which captures the spatio-temporal keypoint interactions from images with object self-occlusions. For decision making, G-DOOM explicitly reasons the learned graph dynamics. We evaluate G-DOOM on a series of challenging rope and cloth manipulation tasks and show that G-DOOM outperforms the SOTA methods. Trained with simulation data, G-DOOM transfers directly to a real robot. However, G-DOOM only considers simple and clean observations, while in most real-world scenarios, the observations are complex and have a high signal-to-noise ratio. We introduce Contrastive Variational Reinforcement Learning (CVRL). CVRL learns a contrastive variational dynamic model discriminatively by maximizing the mutual information between the observations and latent states. The contrastive learning avoids modeling the observations unnecessarily, as the commonly used generative model does, and significantly improves the robustness of the learned model. Rather than naively reasoning the dynamic model for decision making, CVRL guides the search with a learned actor-critic network to further improve the performance. In our empirical experiments, the proposed CVRL significantly outperforms SOTA methods on a set of challenging Natural MuJoCo tasks and a robot box-pushing task with complex observations. Neither CVRL nor G-DOOM sufficiently tackle the partial observability, which commonly exists in real-world decision making. To this end, we introduce Discriminative Particle Filter Reinforcement Learning (DPFRL), a new reinforcement learning framework for complex partial observations. DPFRL encodes a differentiable particle filter algorithm as an explicit computational structure in the neural network policy for explicit reasoning with partial observations overtime, which is trained end- to-end for decision making. Specifically, we additionally Particle Filter Recurrent Neural Networks (PF-RNNs), which parameterize the discriminative particle filter by combining the best of particle filter algorithm with standard RNNs. Empirically, DPFRL outperforms SOTA methods on a series of POMDP RL benchmarks with realistic observations.
URI: https://scholarbank.nus.edu.sg/handle/10635/214490
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
thesis_ma_xiao.pdf37.14 MBAdobe PDF

OPEN

NoneView/Download

Page view(s)

59
checked on Oct 6, 2022

Download(s)

6
checked on Oct 6, 2022

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.