These tools will no longer be maintained as of December 31, 2024. Archived website can be found here. PubMed4Hh GitHub repository can be found here. Contact NLM Customer Service if you have questions.
Pubmed for Handhelds
PUBMED FOR HANDHELDS
Search MEDLINE/PubMed
Title: A3C-GS: Adaptive Moment Gradient Sharing With Locks for Asynchronous Actor-Critic Agents. Author: Labao AB, Martija MAM, Naval PC. Journal: IEEE Trans Neural Netw Learn Syst; 2021 Mar; 32(3):1162-1176. PubMed ID: 32287019. Abstract: We propose an asynchronous gradient sharing mechanism for the parallel actor-critic algorithms with improved exploration characteristics. The proposed algorithm (A3C-GS) has the property of automatically diversifying worker policies in the short term for exploration, thereby reducing the need for entropy loss terms. Despite policy diversification, the algorithm converges to the optimal policy in the long term. We show in our analysis that the gradient sharing operation is a composition of two contractions. The first contraction performs gradient computation, while the second contraction is a gradient sharing operation coordinated by locks. From these two contractions, certain short- and long-term properties result. For the short term, gradient sharing induces temporary heterogeneity in policies for performing needed exploration. In the long term, under a suitably small learning rate and gradient clipping, convergence to the optimal policy is theoretically guaranteed. We verify our results with several high-dimensional experiments and compare A3C-GS against other on-policy policy-gradient algorithms. Our proposed algorithm achieved the highest weighted score. Despite lower entropy weights, it performed well in high-dimensional environments that require exploration due to sparse rewards and those that need navigation in 3-D environments for long survival tasks. It consistently performed better than the base asynchronous advantage actor-critic (A3C) algorithm.[Abstract] [Full Text] [Related] [New Search]