1. 程式人生 > >Reinforcement Learning: An Introduction to the Concepts, Applications and Code

Reinforcement Learning: An Introduction to the Concepts, Applications and Code

Reinforcement Learning: An Introduction to the Concepts, Applications and Code

Part 1: An introduction to reinforcement learning, explaining common terms, concepts and applications.

In this series of reinforcement learning blog posts, I will be trying to create a simplified explanation of the concepts required to understand reinforcement learning and their applications. In this initial post, I highlight some of the main concepts and terminology in reinforcement learning. These concepts will be further explained in future blog posts with the applications and implementations in real-world problems.

Reinforcement Learning

Reinforcement learning (RL) can be viewed as an approach which falls between supervised and unsupervised learning. It is not strictly supervised as it does not rely only on a set of labelled training data but is not unsupervised learning because we have a reward which we want our agent to maximise. The agent needs to find the “right” actions to take in different situations to achieve its overall goal.

Reinforcement learning is the science of decision making.

Reinforcement learning involves no supervisor and only a reward signal is used for an agent to determine if they are doing well or not. Time is a key component in RL where the process is sequential with delayed feedback. Each action the agent makes affects the next data it receives.

Reinforcement Learning applied to Atari games by DeepMind

What is the reinforcement learning problem?

So far we have said that the agent needs to find the “right” action. The right action depends on the rewards.

Reward: The reward Rₜ is a scalar feedback signal which indicates how well the agent is doing at step time t.

In reinforcement learning we need define our problem such that it can be applied to satisfy our reward hypothesis. An example would be playing a game of chess where the agent gets a positive reward for winning a game and a negative reward for losing a game.

Reward Hypothesis: All goals can be described by the maximisation of expected cumulative reward.

Since our process involves sequential decision making tasks, our actions we make early on may have a long-term consequence on our overall goal. Sometimes it may be better to sacrifice immediate reward (reward at time step Rₜ) to gain more long-term reward. An example applied to chess would be to sacrifice a pawn to capture a rook at a later stage.

Goal: The goal is to select actions to maximise total future reward.