Hacking your way to RL in the Real World (someday)

20 Nov 2020 - Arun Balajiee

Talk Speaker: Ed Grefenstette

Talk Date: 11/20/2020

In this talk, Ed talked about the advancements in the reinforcement learning that have happened in the recent years and its applications in building agents that can perform at near human efficiency in several contexts. However, to keep improving the RL agent and to develop more dynamic policies that can work in different scenarios, RL agents can only be tested in virtual environments that are built for testing them. Ed et al.’s main focus of research is in building good environments that can be used to build more robust models with dynamic polices for reinforcement learning, so that these agents can one day be deployed in real world scenarios. In many cases, reward functions cannot be formalized for building the RL agents, while in some cases the RL agents could have to be build with sparse or expensive reward functions to simulate real world circumstances. Further, in some cases where simulation is not possible at all an RL environment could be the key to building agents with dyanmic policies to work on these scenarios. In these cases, the action and state spaces could quite large for them to be mathemically or computationally be deviced

Ed talked about building NetHack, a 2D world created to deploy RL agents to play a game. This environemt allows for improved exploration function in the RL policies for the learning of these agents to better perform in a real world context. NetHack is procedurally generated environment where the reward function is defined using new states and the problem space is defined with new state spaces. The higher reward is designated to states that are more interesting. This way the rewards become intrinsic to the states that cannot vanish with training the RL agent. He talked about RIDE, an impact-driven episodic RL model that uses more exploration than usually used in designing RL agents. In these cases, the states are set to be challenging so that the reward function is intrinsic and mimics a real human scenario.

In the second segment of the talk, he discussed the idea of designign RL agents that use Natural language cues, similar to humans, for training the RL agent. By reading a text or wiki the Rl agent has to navigate different levels in the environment to be successful and identify new, interesting states. This is to mimic how a highly intelligent human would perform in such an environemt by taking information from different sources and succeeding at the task. The interesting ramifications of this would be the applications of this RL policy to handle multi-modal data and being able to design an model that is close to the complex human mind.

This was a very interesting talk as it showed me the different applications and new possiblities that existing in the field of reinforcement learning research. Building better models requires them to face new and challenging scenarios to be able to perform useful functions for humans in a real world scenario. It is obvious then to note that such a research is imperative in designing AI agents that become better capable at solving human problems, computationally.