Papers
arxiv:2304.12886

What can online reinforcement learning with function approximation benefit from general coverage conditions?

Published on Apr 25, 2023
Authors:
,
,

Abstract

In online <PRE_TAG>reinforcement learning</POST_TAG> (RL), instead of employing standard structural assumptions on Markov decision processes (MDPs), using a certain coverage condition (original from offline RL) is enough to ensure sample-efficient guarantees (Xie et al. 2023). In this work, we focus on this new direction by digging more possible and general coverage conditions, and study the potential and the utility of them in efficient online RL. We identify more concepts, including the L^p variant of concentrability, the density ratio realizability, and trade-off on the partial/rest <PRE_TAG>coverage condition</POST_TAG>, that can be also beneficial to sample-efficient online RL, achieving improved regret bound. Furthermore, if exploratory offline data are used, under our coverage conditions, both statistically and computationally efficient guarantees can be achieved for online RL. Besides, even though the MDP structure is given, e.g., linear <PRE_TAG>MDP</POST_TAG>, we elucidate that, good coverage conditions are still beneficial to obtain faster regret bound beyond O(T) and even a logarithmic order regret. These results provide a good justification for the usage of general coverage conditions in efficient online RL.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2304.12886 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2304.12886 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2304.12886 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.