MODEL-BASED REINFORCEMENT LEARNING FOR COMPLEX ENVIRONMENTS | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/214490

DC Field	Value
dc.title	MODEL-BASED REINFORCEMENT LEARNING FOR COMPLEX ENVIRONMENTS
dc.contributor.author	MA XIAO
dc.date.accessioned	2022-01-31T18:00:42Z
dc.date.available	2022-01-31T18:00:42Z
dc.date.issued	2021-06-30
dc.identifier.citation	MA XIAO (2021-06-30). MODEL-BASED REINFORCEMENT LEARNING FOR COMPLEX ENVIRONMENTS. ScholarBank@NUS Repository.
dc.identifier.uri	https://scholarbank.nus.edu.sg/handle/10635/214490
dc.description.abstract	Deep reinforcement learning (DRL) has achieved great success in sophisticated games, such as Atari, Go, etc. Compared with classic planning methods that explicitly reason handcrafted and potentially inaccurate dynamic models, standard DRL methods directly map raw observations to a policy, which reduces accumulative model-predictive error and maximizes the overall performance of the system. However, generalizing existing DRL methods to real-robot setups remains challenging. Different from a game environment where the observations are well-defined and relatively simple, most of the real-world decision making tasks require reasoning a low-dimensional state embedded in high-dimensional complex partial observations. In this thesis, we focus on model-based reinforcement learning. We demonstrate that by the discriminative learning of structured dynamic models, the overall performance of DRL algorithms under complex environments can be significantly improved. We begin with the task of manipulating objects with complex dynamics, in particular, the deformable objects. Deformable object manipulation is challenging due to their infinite degree-of-freedom (Dof) and highly non-linear dynamics. We present latent Graph dynamics for DefOrmable Object Manipulation (G-DOOM). G- DOOM explicitly imposes a structured graph state representation that approximates a deformable object as a sparse set of interacting keypoints and learns a graph neural network that captures abstractly the geometry and interaction dynamics of the keypoints. Such a parameterization tackles the high DoF problem and complex non-linear dynamics issue of a deformable object. We train a recurrent graph dynamics, which captures the spatio-temporal keypoint interactions from images with object self-occlusions. For decision making, G-DOOM explicitly reasons the learned graph dynamics. We evaluate G-DOOM on a series of challenging rope and cloth manipulation tasks and show that G-DOOM outperforms the SOTA methods. Trained with simulation data, G-DOOM transfers directly to a real robot. However, G-DOOM only considers simple and clean observations, while in most real-world scenarios, the observations are complex and have a high signal-to-noise ratio. We introduce Contrastive Variational Reinforcement Learning (CVRL). CVRL learns a contrastive variational dynamic model discriminatively by maximizing the mutual information between the observations and latent states. The contrastive learning avoids modeling the observations unnecessarily, as the commonly used generative model does, and significantly improves the robustness of the learned model. Rather than naively reasoning the dynamic model for decision making, CVRL guides the search with a learned actor-critic network to further improve the performance. In our empirical experiments, the proposed CVRL significantly outperforms SOTA methods on a set of challenging Natural MuJoCo tasks and a robot box-pushing task with complex observations. Neither CVRL nor G-DOOM sufficiently tackle the partial observability, which commonly exists in real-world decision making. To this end, we introduce Discriminative Particle Filter Reinforcement Learning (DPFRL), a new reinforcement learning framework for complex partial observations. DPFRL encodes a differentiable particle filter algorithm as an explicit computational structure in the neural network policy for explicit reasoning with partial observations overtime, which is trained end- to-end for decision making. Specifically, we additionally Particle Filter Recurrent Neural Networks (PF-RNNs), which parameterize the discriminative particle filter by combining the best of particle filter algorithm with standard RNNs. Empirically, DPFRL outperforms SOTA methods on a series of POMDP RL benchmarks with realistic observations.
dc.language.iso	en
dc.subject	Reinforcement Learning, Representation Learning, Robotics
dc.type	Thesis
dc.contributor.department	COMPUTER SCIENCE
dc.contributor.supervisor	Xu Ye, David Hsu
dc.description.degree	Ph.D
dc.description.degreeconferred	DOCTOR OF PHILOSOPHY (SOC)
dc.identifier.orcid	0000-0001-5466-867X
Appears in Collections:	Ph.D Theses (Open)

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
thesis_ma_xiao.pdf		37.14 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.