Application of Reward Learning to generate news

Abstract

This paper examines the usage of proximal policy optimization applied to pre-trained neural language models based on the transformer architecture. This approach is then used to generate convincing News.