DDQN: Metrics for measuring stability using the example of replay buffer size and minibatch size

Abstract

The Reinforcement Learning algorithm Double Deep Q-Network (DDQN) is known to have an unstable training process (Halat and Ebadzadeh, 2021). In order to overcome instability, this paper aims to deepen the understanding of stability and measuring it. Therefore, numerical indicators are proposed to determine convergence and experiment stability. Additionally, the metrics are investigated for Cartpole by adapting the replay buffer size and minibatch size. Experimental results show that the minibatch size has a higher impact on stability. Best stabilities are achieved with the lowest minibatch size of 10. This setup leads to an up to 4.6 % higher stability.