Uğurcan Özalp, M.Sc.
Department of Scientific Computing
August 2021

Supervisor: Ömür Uğur (Institute of Applied Mathematics, Middle East Technical University, Ankara)

Abstract

Deep Reinforcement Learning methods on mechanical control have been successfully applied in many environments and used instead of traditional optimal and adaptive control methods for some complex problems. However, Deep Reinforcement Learning algorithms do still have some challenges. One is to control on partially observable environments. When an agent is not informed well of the environment, it must recover information from the past observations. In this thesis, walking of Bipedal Walker Hardcore (OpenAI GYM) environment, which is partially observable, is studied by two continuous actor-critic reinforcement learning algorithms; Twin Delayed Deep Deterministic Policy Gradient and Soft Actor-Critic. Several neural architectures are implemented. The first one is Residual Feed Forward Neural Network under the observable environment assumption, while the second and the third ones are Long Short Term Memory and Transformer using observation history as input to recover the hidden information due to the partially observable environment.

Keywords: deep reinforcement learning, partial observability, robot control, actor-critic methods, long short term memory, transformer

Orta Doğu Teknik Üniversitesi, Uygulamalı Matematik Enstitüsü, Üniversiteler Mahallesi, Dumlupınar Bulvarı No:1, 06800 Çankaya/Ankara