Adaptive Choice of Grid and Time in Reinforcement Learning

Abstract

We propose local error estimates together with algorithms for adap- tive a-posteriori grid and time refinement in reinforcement learn- ing. We consider a deterministic system with continuous state and time with infinite horizon discounted cost functional. For grid re- finement we follow the procedure of numerical methods for the Bellman-equation. For time refinement we propose a new criterion, based on consistency estimates of discrete solutions of the Bellman- equation. We demonstrate, that an optimal ratio of time to space discretization is crucial for optimal learning rates and accuracy of the approximate optimal value function.

In 1987 – 2019 Neural Information Processing Systems Foundation, Inc.