Papers
arxiv:2310.19805

Sample Efficient Reward Augmentation in offline-to-online Reinforcement Learning

Published on Oct 7, 2023
Authors:
,
,
,
,

Abstract

Offline-to-online RL can make full use of pre-collected offline datasets to initialize policies, resulting in higher sample efficiency and better performance compared to only using online algorithms alone for policy training. However, direct fine-tuning of the pre-trained policy tends to result in sub-optimal performance. A primary reason is that conservative offline RL methods diminish the agent's capability of exploration, thereby impacting online fine-tuning performance. To encourage agent's exploration during online fine-tuning and enhance the overall online fine-tuning performance, we propose a generalized reward augmentation method called Sample Efficient Reward Augmentation (SERA). Specifically, SERA encourages agent to explore by computing Q conditioned entropy as intrinsic reward. The advantage of SERA is that it can extensively utilize offline pre-trained Q to encourage agent uniformly coverage of state space while considering the imbalance between the distributions of high-value and low-value states. Additionally, SERA can be effortlessly plugged into various RL algorithms to improve online fine-tuning and ensure sustained asymptotic improvement. Moreover, extensive experimental results demonstrate that when conducting offline-to-online problems, SERA consistently and effectively enhances the performance of various offline algorithms.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2310.19805 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2310.19805 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2310.19805 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.