Intrinsic reward driven exploration for deep reinforcement learning
Deep reinforcement learning has become one of the hottest research topics in machine learning. In reinforcement learning, agents interact with the environment and try to maximise the expected cumulative reward. The goal of reinforcement learning is to find a policy to maximise the agent’s total cumulative rewards. Unfortunately, some environments can only provide extremely sparse rewards, so the agent needs to learn a strategy to explore in its environment more efficiently to find these rewards. However, it is known that exploration in complex environments is a key challenge of deep reinforcement learning, especially for tasks where rewards are very sparse. In this thesis, intrinsic reward driven exploration strategies are investigated. The agent driven by this intrinsic reward can explore expeditiously, so as to find the sparse extrinsic rewards provided by the environment. Recently, surprise has been used as an intrinsic reward that encourages systematic and efficient exploration. We first define a novel intrinsic reward function called assorted surprise, and propose Variational Assorted Surprise Exploration (VASE) algorithm to approximate this assorted surprise in a tractable way, with the help of Bayesian neural networks. Then we apply VASE algorithm to continuous control problems and large scale Atari video games respectively. Experimental results show that VASE performs well across these tasks. Then we discover that all surprise based exploration methods will lose exploration efficiency in areas where the environmental transition is discontinuous. To solve this problem, we propose Mutual Information Minimising Exploration (MIME) algorithm. We show that MIME can explore as efficiently as surprise based methods in other areas of the environment but much better in areas with discontinuous transitions.
Advisor: McCane, Brendan; Szymanski, Lech
Degree Name: Doctor of Philosophy
Degree Discipline: Department of Computer Science
Publisher: University of Otago
Keywords: New Zealand; Surprise-driven learning; Reinforcement learning; Intrinsic reward driven learning
Research Type: Thesis