These tools will no longer be maintained as of December 31, 2024. Archived website can be found here. PubMed4Hh GitHub repository can be found here. Contact NLM Customer Service if you have questions.


PUBMED FOR HANDHELDS

Search MEDLINE/PubMed


  • Title: Mild Policy Evaluation for Offline Actor-Critic.
    Author: Huang L, Dong B, Lu J, Zhang W.
    Journal: IEEE Trans Neural Netw Learn Syst; 2023 Sep 07; PP():. PubMed ID: 37676802.
    Abstract:
    In offline actor-critic (AC) algorithms, the distributional shift between the training data and target policy causes optimistic Q value estimates for out-of-distribution (OOD) actions. This leads to learned policies skewed toward OOD actions with falsely high Q values. The existing value-regularized offline AC algorithms address this issue by learning a conservative value function, leading to a performance drop. In this article, we propose a mild policy evaluation (MPE) by constraining the difference between the Q values of actions supported by the target policy and those of actions contained within the offline dataset. The convergence of the proposed MPE, the gap between the learned value function and the true one, and the suboptimality of the offline AC with MPE are analyzed, respectively. A mild offline AC (MOAC) algorithm is developed by integrating MPE into off-policy AC. Compared with existing offline AC algorithms, the value function gap of MOAC is bounded by the existence of sampling errors. Moreover, in the absence of sampling errors, the true state value function can be obtained. Experimental results on the D4RL benchmark dataset demonstrate the effectiveness of MPE and the performance superiority of MOAC compared to the state-of-the-art offline reinforcement learning (RL) algorithms.
    [Abstract] [Full Text] [Related] [New Search]