1. 程式人生 > >Playing Atari with Deep Reinforcement Learning論文解讀

Playing Atari with Deep Reinforcement Learning論文解讀

1.Abstract

We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
我們提出了第一個深度學習模型,使用強化學習直接從高維感覺輸入成功學習控制策略。該model是一個卷積神經網路,使用Q學習的變體進行訓練,
其輸入是原始畫素raw pixels,其輸出是估計未來的值函式獎勵value function estimating future rewards。 我們將方法應用於Arcade學習環境中的七個Atari 2600遊戲,不需要調整架構或學習演算法。 我們發現它在六場比賽中超越了之前的所有方法並且超越了三個人類專家。

2.演算法思路:

本文證明了卷積神經網路可以克服這些挑戰,從複雜RL環境中的原始視訊資料中學習成功的控制策略。 使用Q學習[26]演算法的變體訓練網路,使用隨機梯度下降來更新權重。 為了緩解相關資料和非平穩分佈的問題,我們使用經驗重放機制[13]隨機抽樣先前的過渡,從而平滑過去許多行為的訓練分佈。

3.實現目標:

Our goal is to create a single neural network agent that is able to successfully learn to play as many of the games as possible.
(it learned from nothing but the video input, the reward and terminal
signals, and the set of possible actions—just as a human player would.)

Our goal is to connect a reinforcement learning algorithm to a deep neural network which operates directly on RGB images and efficiently process training data by using stochastic gradient updates.