These tools will no longer be maintained as of December 31, 2024. Archived website can be found here. PubMed4Hh GitHub repository can be found here. Contact NLM Customer Service if you have questions.
120 related articles for article (PubMed ID: 35002545)
1. Learning and Planning for Time-Varying MDPs Using Maximum Likelihood Estimation. Ornik M; Topcu U J Mach Learn Res; 2021; 22():1-40. PubMed ID: 35002545 [TBL] [Abstract][Full Text] [Related]
2. Safety-Guaranteed, Accelerated Learning in MDPs with Local Side Information. Thangeda P; Ornik M Proc Am Control Conf; 2020 Jul; 2020():1099-1104. PubMed ID: 33223606 [TBL] [Abstract][Full Text] [Related]
3. Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems? Ochi K; Kamiura M Biosystems; 2015 Sep; 135():55-65. PubMed ID: 26166266 [TBL] [Abstract][Full Text] [Related]
4. An immediate-return reinforcement learning for the atypical Markov decision processes. Pan Z; Wen G; Tan Z; Yin S; Hu X Front Neurorobot; 2022; 16():1012427. PubMed ID: 36582302 [TBL] [Abstract][Full Text] [Related]
5. Uncertainty and exploration in a restless bandit problem. Speekenbrink M; Konstantinidis E Top Cogn Sci; 2015 Apr; 7(2):351-67. PubMed ID: 25899069 [TBL] [Abstract][Full Text] [Related]
6. Learning parametric policies and transition probability models of markov decision processes from data. Xu T; Zhu H; Paschalidis IC Eur J Control; 2021 Jan; 57():68-75. PubMed ID: 33716408 [TBL] [Abstract][Full Text] [Related]
7. Learning Dynamics and Control of a Stochastic System under Limited Sensing Capabilities. Zadenoori MA; Vicario E Sensors (Basel); 2022 Jun; 22(12):. PubMed ID: 35746272 [TBL] [Abstract][Full Text] [Related]
8. An empirical evaluation of active inference in multi-armed bandits. Marković D; Stojić H; Schwöbel S; Kiebel SJ Neural Netw; 2021 Dec; 144():229-246. PubMed ID: 34507043 [TBL] [Abstract][Full Text] [Related]
10. Markov decision processes: a tool for sequential decision making under uncertainty. Alagoz O; Hsu H; Schaefer AJ; Roberts MS Med Decis Making; 2010; 30(4):474-83. PubMed ID: 20044582 [TBL] [Abstract][Full Text] [Related]
11. Revisiting the Role of Uncertainty-Driven Exploration in a (Perceived) Non-Stationary World. Guo D; Yu AJ Cogsci; 2021 Jul; 43():2045-2051. PubMed ID: 34368809 [TBL] [Abstract][Full Text] [Related]
12. Reinforcement Learning-Aided Channel Estimator in Time-Varying MIMO Systems. Kim TK; Min M Sensors (Basel); 2023 Jun; 23(12):. PubMed ID: 37420854 [TBL] [Abstract][Full Text] [Related]
13. A Maximum Divergence Approach to Optimal Policy in Deep Reinforcement Learning. Yang Z; Qu H; Fu M; Hu W; Zhao Y IEEE Trans Cybern; 2023 Mar; 53(3):1499-1510. PubMed ID: 34478393 [TBL] [Abstract][Full Text] [Related]
14. Parameterized MDPs and Reinforcement Learning Problems-A Maximum Entropy Principle-Based Framework. Srivastava A; Salapaka SM IEEE Trans Cybern; 2022 Sep; 52(9):9339-9351. PubMed ID: 34406959 [TBL] [Abstract][Full Text] [Related]
15. Do not Bet on the Unknown Versus Try to Find Out More: Estimation Uncertainty and "Unexpected Uncertainty" Both Modulate Exploration. Payzan-Lenestour E; Bossaerts P Front Neurosci; 2012; 6():150. PubMed ID: 23087606 [TBL] [Abstract][Full Text] [Related]
16. Theory of choice in bandit, information sampling and foraging tasks. Averbeck BB PLoS Comput Biol; 2015 Mar; 11(3):e1004164. PubMed ID: 25815510 [TBL] [Abstract][Full Text] [Related]
17. Some performance considerations when using multi-armed bandit algorithms in the presence of missing data. Chen X; Lee KM; Villar SS; Robertson DS PLoS One; 2022; 17(9):e0274272. PubMed ID: 36094920 [TBL] [Abstract][Full Text] [Related]
18. On Practical Robust Reinforcement Learning: Adjacent Uncertainty Set and Double-Agent Algorithm. Hwang U; Hong S IEEE Trans Neural Netw Learn Syst; 2024 Apr; PP():. PubMed ID: 38619960 [TBL] [Abstract][Full Text] [Related]
19. Human Belief State-Based Exploration and Exploitation in an Information-Selective Symmetric Reversal Bandit Task. Horvath L; Colcombe S; Milham M; Ray S; Schwartenbeck P; Ostwald D Comput Brain Behav; 2021; 4(4):442-462. PubMed ID: 34368622 [TBL] [Abstract][Full Text] [Related]
20. Cognitively inspired reinforcement learning architecture and its application to giant-swing motion control. Uragami D; Takahashi T; Matsuo Y Biosystems; 2014 Feb; 116():1-9. PubMed ID: 24296286 [TBL] [Abstract][Full Text] [Related] [Next] [New Search]