Reinforcement Learning An Introduction~Limitations and Scope

阿新 • • 發佈：2019-01-03

1.4 限制和範圍

強化學習在很大程度上依賴於這種稱為狀態的概念，它是作為政策和價值函式的輸入，以及作為模型的輸入和輸出。非正式地，我們可以將狀態視為向智慧體傳達，在特定時間某種“環境如何”的訊號。我們在此處使用的狀態的正式定義，由第3章的馬爾可夫決策過程的框架給出。然而，更一般地說，我們鼓勵讀者遵循非正式的含義，並將狀態視為是智慧體對其環境來說，可獲得的任何資訊。實際上，我們假設狀態訊號是由一些名義上屬於智慧體環境的預處理系統產生的。我們本書中沒有解決構建，改變或學習狀態訊號的問題（除了第17.3節中的簡要介紹）。我們採用這種方法並不是因為我們認為狀態表示不重要，而是為了完全關注決策問題。換句話說，我們主要關心的不是設計狀態訊號，而是決定採取什麼動作作為任何狀態訊號可用的函式。

我們在本書中考慮的大多數強化學習方法都是圍繞估計價值函式設計的，但這對於解決強化學習問題，並不是絕對必要的。例如，諸如遺傳演算法，遺傳程式設計，模擬退火和其他優化方法從不估計價值函式。這些方法應用多個靜態策略，每個策略在較長時間內與單獨的環境例項進行互動。獲得最多獎勵的策略及其隨機變化將延續到下一代策略，並重復該過程。我們稱之為這些進化方法，因為它們的操作類似於生物進化，即使它們在個體生命期間不學習，也能產生生物的具有熟練行為的方式。如果策略的空間足夠小，或者是可以構建的，或者如果有大量時間可用於搜尋，那麼好的策略是常見的或易於查詢的，那麼進化方法可能是有效的。此外，在可學習的智慧體無法感知其環境的完整狀態的問題上，進化方法具有優勢。

我們的重點是強化學習方法，這些方法在與環境互動時學習，進化方法並不能夠。在許多情況下，能夠利用個體行為相互作用細節的方法比進化方法更有效。進化方法忽略了強化學習問題的許多有用結構：它們沒有使用他們正在尋找的策略是從狀態到動作的功能；它們沒有注意到一個情節在其生命週期中經歷了哪些狀態，或者它們選擇了哪些動作。在某些情況下，這些資訊可能會產生誤導（例如，當狀態被誤解時），但更常見的是它應該能夠實現更有效的搜尋。雖然進化和學習共享許多特徵並自然地協同工作，但我們並不認為進化方法本身特別適合強化學習問題，因此，我們不在本書中介紹它們。

Reinforcement Learning An Introduction~Limitations and Scope

1.4 限制和範圍強化學習在很大程度上依賴於這種稱為狀態的概念，它是作為政策和價值函式的輸入，以及作為模型的輸入和輸出。非正式地，我們可以將狀態視為向智慧體傳達，在特定時間某種“環境如何”的訊號。我們在此處使用的狀態的正式定義，由第3章的馬爾可夫決策過程的框

Reinforcement Learning: An Introduction to the Concepts, Applications and Code

Reinforcement Learning: An Introduction to the Concepts, Applications and CodePart 1: An introduction to reinforcement learning, explaining common terms, c

Reinforcement Learning An Introduction~Reinforcement Learning

第一章介紹當我們考慮學習的本質時，我們首先想到的可能是通過與環境互動學習。當一個嬰兒玩耍，揮動手臂或環顧四周時，它沒有明確的老師，但它確實與其環境有直接的感覺運動聯絡。通過這種聯絡可以產生大量關於因果關係的資訊，關於動作的後果，以及為實現目標應該做些什麼。在我

Reinforcement Learning An Introduction~Examples

1.2 例子理解強化學習的一個好方法是考慮一些指導其發展的示例和可能的應用。大師級國際象棋選手採取行動。通過考慮可能的落子和反擊這種計劃來做出選擇，以及對特定位置和落子的可取性採取果斷的直接的判斷。自適應控制器實時調整煉油廠操作的引數。控制器在此基礎上優化產量/成

Reinforcement Learning An Introduction~Elements of Reinforcement Learning

1.3 強化學習的要素除了智慧體和環境之外，我們還可以識別強化學習系統的四個主要子元素：策略，獎勵訊號，價值函式，以及可選的環境模型。策略定義為可以學習的智慧體在給定時間的行為方式。粗略地說，策略是從感知的環境狀態到在這些狀態下要採取的動作

強化學習導論(Reinforcement Learning: An Introduction)讀書筆記(一)：強化學習介紹

因為課題轉到深度強化學習方面，因此開始研究強化學習的內容，同時在讀這方面的書，並將Reinforcement Learning: An Introduction（Richard S. Sutton and Andrew G.Barto）第二版作為主要的學習資料，

Reinforcement Learning: Super Mario, AlphaGo and beyond

You might not be able to totally recall the first time you ever played Mario, but just like any other game, you might have started with a clean slate, not

Reinforcement Learning：An Introduction Chapter 1 學習筆記

Chapter 1: Introduction 人類與環境進行互動，學習環境如何響應我們的行為，並試圖通過自身行為影響將來發生的事，這就是一種互動式的學習方式，是人類獲取知識的主要來源，同時也是幾乎所有學習和智慧化理論的基本思想。強化學習正是一種從互動中學習的計

An Introduction to Deep Learning and Neural Networks

aitopics.org uses cookies to deliver the best possible experience. By continuing to use this site, you consent to the use of cookies. Learn more » I und

強化學習導論（Reinforcement Learning：An Introduction）學習筆記（一）

Introduction 我們在思考學習本質時首先想到的可能就是通過與我們的環境進行互動從而學習。當一個嬰兒玩耍時，揮動著他的手臂，雖然看起來，他沒有明確的老師，但他確實與他的環境有直接的感覺聯絡。並且這種聯絡可以產生大量關於因果，行為後果以及為了實現

強化學習導論（Reinforcement Learning：An Introduction）學習筆記（六）

強化學習導論 1.6小結強化學習是一種理解和自動進行目標導向學習和決策的計算方法。它與其他計算方法不同之處在於它強調了代理與環境的直接互動學習，而不依賴於監督或完整的環境模型。在我們看來，強化學習是第一個認真處理在學習與環境的互動時產生的計算問題，

CS294-112 深度強化學習秋季學期（伯克利）NO.19 Guest lecture: Igor Mordatch (Optimization and Reinforcement Learning in Multi-Agent Settings)

nbsp setting TP for agent image learn ctu Go

論文筆記12:Building Adaptive Tutoring Model using Artificial Neural Networks and Reinforcement Learning

論文筆記12：《Building Adaptive Tutoring Model using Artificial Neural Networks and Reinforcement Learning》參考文獻:Building Adaptive Tutoring Model Using Ar

深度強化學習cs294 Lecture3&Lecture4: Introduction to Reinforcement Learning

深度強化學習cs294 Lecture3&Lecture4: Introduction toReinforcement Learning 1. Definition of a Markov decision process 2. Definit

A brief introduction to reinforcement learning

In this article, we'll discuss: Let's start the explanation with an example -- say there is a small baby who starts learning how to walk. Let's divide thi

An introduction to Generative Art: what it is, and how you make it

An introduction to Generative Art: what it is, and how you make itMandelbrot’s Fractal is derived from a deceptively simple equationGenerative art can be a

Explore and get value out of your raw data: An Introduction to Splunk

Install Splunk EnterpriseLet’s start by installing Splunk Enterprise in your machine. Installing Splunk is quite straightforward and the setup package is a

Introduction to Reinforcement Learning – Towards Data Science

Table of ContentsCartpole ProblemCartpole - known also as an Inverted Pendulum is a pendulum with a center of gravity above its pivot point. It’s unstable,

learning system tackles speech and object recognition, all at once: Model learns to pick out objects within an image, using spok

Unlike current speech-recognition technologies, the model doesn't require manual transcriptions and annotations of the examples it's trained on. Instead,

An Introduction to Web-Shells – Final Part（Detection and Prevention）

Detection If an administrator suspects that a web-shell is present on their system (or during a routine check), the following are some things to exa

Reinforcement Learning An Introduction~Limitations and Scope

1.4 限制和範圍

相關推薦