🏆📚 Edge#103: Reinforcement Learning Recap

TheSequence is the best way to build and reinforce your knowledge about machine learning and AI

Every six months we provide a summary of the key topics covered in TheSequence. Catch up with what you missed and prepare for the next half of the year! The ML&AI universe is expanding. This issue is a complete recap of the series about Reinforcement Learning, one of the most popular and yet misunderstood deep learning disciplines. It’s our longest series so far, which consists of twelve Edges. Let’s have some useful intro about the whole category first:

💡What is Reinforcement Learning?

Conceptually, reinforcement learning (RL) methods focus on mastering an environment or task by interacting with it on a trial and error basis in real-time. For instance, reinforcement learning algorithms like AlphaGo or AlphaZero were able to master the game of Go by just playing. They were also able to rediscover strategies used by the top masters in the world without having prior knowledge of it. This methodology resembles the cognition mechanisms developed by babies when learning new tasks. From an architecture standpoint, a reinforcement learning model is based on four fundamental components:  

  • Agent: The intelligent program trying to learn a new task.  

  • Environment: The programmatic world that the agent interacts with to execute the target tasks.  

  • Rewards: A function that produces a score quantifying how the agent performs with respect to the environment.  

  • Policy: an algorithm used by the agent to decide the next actions to take.  

In a nutshell, a reinforcement learning agent interacts with an environment and evaluates a policy network depending on the results of its actions, which are evaluated by the rewards function.  

Image credit: MIT Press

Although there are many methods and forms of reinforcement learning algorithms, they can be grouped into two main categories:  

  1. Model-Free: This type of model ignores the environment and, instead, tries to focus on sampling and simulation to master an efficient policy.  

  2. Model-Based: This type of model starts with a well-known reward function and tries to maximize the reward in order to understand the environment. ***

The main difference is that the model-based algorithm tries to get familiar with its environment, while the model-free algorithm tries to optimize its policy gradient. In Edge#85, the first Edge about RL, we also cover OpenAI RL agents and the TensorForce RLframework.

Forward this email to those who might benefit from reading it or ->

Give a gift subscription

In Edge#86, we wrote about how DeepMind prevents RL agents from getting “too clever”.

In Edge#87, we cover the concept of model-based reinforcement learning; how Google Dreamer uses model-based reinforcement learning to learn long-horizon tasksUber Fiber, a distributed computing framework optimized for RL agents.

Edge#90 is about OpenAI Safety Gym – an environment to improve safety in RL Models.

In Edge#91, we explain the concept of model-free reinforcement learning (MFRL); Agent57, an MFRL agent that outperformed the standard human benchmark on all 57 Atari games; DeepMind’s OpenSpiel – an open-source reinforcement learning framework for games.

In Edge#93, we cover what Q-Learning models are; how Google SEED RL architecture enables highly scalable RL tasks; Facebook’s ReAgent that is used for building reinforcement learning systems.   

Edge#95 is about Deep Q-Networks (DQN) reinforcement learning models; DeepMind’s RL agent that masters Quake III; OpenAI Gym – one of the most important technology stacks for modern reinforcement learning solutions. 

In Edge#97, we explain Policy optimization RL Methods (PO RL); how Google trained RL agents to master the most popular sport in the world; DeepMind’s BSuite is a unique Benchmark System for RL models. 

Edge#98 is about how OpenAI built RL agents that mastered Montezuma’s Revenge by going backwards.

In Edge#99, we cover what trust region and proximal policy optimization are; RLib – an open-source framework for highly scalable RL; how OpenAI used PPO RL to master Dota 2

Edge#100 is about the Facebook NetHack challenge, which is likely to become one of the toughest reinforcement learning benchmarks in history.

In Edge#101, we finalize the Reinforcement Learning series, discussing the Exploration-Exploitation Dilemma; review how Microsoft Research uses Bayesian exploration to address the exploration-exploitation dilemma in RL agents; and explore TF-Agents, a modular RL Library for TensorFlow.