Learning to Communicate with Deep Multi-Agent Reinforcement Learning

阿新 • • 發佈：2018-12-16

2017Nips的文章，看了一篇18的一篇相關方向的，但是沒太明白，第一次看communicate的文章（multi-agent RL with communication），理解的也不太透徹。

大概簡要介紹一下：

在MA的環境中，agent需要相互合作去完成任務，這個時候就需要agent之間相互交流，從而合作完成任務，之前的文章裡都是

沒有agent間交流的。或者說是沒有顯示的定義出來這一個特徵。

需要交流的一個前提是，合作+部分可觀測，否則交流就沒有什麼意義了

本文的訓練通過centralised learning 和 decentralised execution。

在這篇文章，centralised learning 主要是在學習的過程中communictation沒有被限制

而執行的時候，就只能通過有限頻寬的通道。

為了能夠讓agent自主提出有效的conmunication protocol，文章提出了兩種解決方法：

第一種，RIAL（reinforced inter-agent learning）使用DRQN解決部分可觀測。

第二種，DIAL（differentiable inter-agent learning）RIAL在一個agent中是端到端可訓練的（agents之間沒有gradient傳遞）。而DIAL是在多個agent中端到端可訓練的。

Related Work：

Independent DQN ：每個agent 同時學習他們自己的Q function

Deep Recurrent Q-Networks：DQN和independent DQN都假設full observability 。而在部分可觀測的環境中，St被隱藏，Ot是與St相關的資訊。只有Ot可以觀測到。DRQN被提出，將第一層FC更換為LSTM，用於記住較長時間的資訊。從而獲取更多有用的資訊。

Setting：

由於協議是從動作觀察歷史到訊息序列的對映，因此協議的空間非常高。在這個領域自動發現有效的協議仍然是一個難以捉摸的挑戰。特別是，由於需要代理來協調訊息的傳送和解釋，探索這種協議空間的難度增加了。

舉個例子：如果一個代理向另一個代理髮送了有用的訊息，那麼只有在接收代理正確地解釋和操作該訊息時，它才會收到一個正的獎勵。如果沒有，傳送方將不鼓勵再次傳送該訊息。因此，積極的獎勵是稀疏的，只有在傳送和解釋的時候才會產生適當的協調，這是很難通過隨機探索來發現的。

下面詳細介紹提出的兩個演算法：

Reinforced Inter-Agent Learning：

將DRQN和independent Q learning相結合。每個agent的Q函式如下：

這個方法在訓練的關閉了exp replay，為了避免不穩定環境下帶來的經眼失效或誤導

這個方法我們使用了引數共享，也就是多個agent共享一個net。這裡我們一共訓練學習了兩個Q-function（分別對應動作u和資訊m）：

和

Differentiable Inter-Agent Learning：

上一個方法不是本paper的重點，重點是這一個演算法（DIAL）

上一個演算法仍然存在問題，就是無法給agent反饋，也就是類似於人的交流。例如，在面對面的交流中，聽者會向說話者傳送快速的非語言佇列，以表明他們的理解程度和興趣。RIAL缺乏這種反饋機制。

我們提出了可微內代理學習(DIAL)方法。最重要的是集中學習和Q-networks的結合使之成為可能，不僅可以共享引數，還可以通過通訊通道將梯度從一個代理推到另一個代理。因此，雖然在每個代理中RIAL是端到端可培訓的，但DIAL是端到端可培訓的跨代理。讓梯度從一個代理流向另一個代理可以給他們更豐富的反饋，通過嘗試和錯誤減少所需的學習量，並簡化有效協議的發現。

DIAL工作原理如下:在集中學習過程中，通訊動作被一個代理網路的輸出和另一個代理網路的輸入之間的直接連線所取代。因此，當任務限制通訊到離散訊息時，在學習過程中，代理可以自由地彼此傳送值訊息。由於這些訊息的功能與任何其他網路啟用一樣，梯度可以沿著通道返回，從而允許跨代理端到端反向傳播。

DIAL的虛擬碼：為了避免我解釋的有問題，附上英文文獻的解釋：

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

2017Nips的文章，看了一篇18的一篇相關方向的，但是沒太明白，第一次看communicate的文章（multi-agent RL with communication），理解的也不太透徹。大概簡要介紹一下：在MA的環境中，agent需要相互合作去完成任務，這個時

We'd Like To See Deep Learning Hand To Hand With Deep Feeling

Hardly could anyone find a point at the Heidelberg Laurate Forum space-time without hearing about Artificial Intelligence, Machine Learning and Deep Learni

Ask HN: Which can be the best way to communicate with the user?

I work with a cms and the eternal argument with the clients is how to interact with the users of the website. WhatsApp business or chatbot?

Learning to Communicate

In this post we'll outline new OpenAI research in which agents develop their own language. Our hypothesis is that true language understanding will c

Configuring Cognito User Pools to Communicate with AWS IoT Core

AWS IoT Core supports certificate-based mutual authentication, custom authorizers, and Amazon Cognito Identity as way to authenticate requests to

對於DAN方法的解讀-Learning Transferable Features with Deep Adaptation Networks

上週彙報了該篇經典文章，現在回顧並且記錄一下自己對DAN方法的理解深度適配網路-DAN《利用深度適應網路學習可遷移特徵》下面分為五個部分來講解：一.研究背景二.本論文所解決的問題三.DAN方法四.實驗部分五.結合自己的論文一.研究背景精簡的說，研

Learning Structured Representation for Text Classification via Reinforcement Learning 學習筆記

ctu recursive fec 註釋 css 進攻 imp column converge Representation learning ：表征學習，端到端的學習 pre-specified 預先指定的 demonstrate 論證;證明，證實;顯示

《Learning to Compare: Relation Network for Few-Shot Learning》論文閱讀

通過對比實現少樣本或零樣本學習Learning to Compare: Relation Network for Few-Shot Learning 動機我們就發現了，我們人之所以能夠識別一個新的東西，在於我們人的視覺系統天生的能夠對任意物體提取特徵，並進行比

Playing Atari with Deep Reinforcement Learning

distrib xiv 遊戲模擬器 video value 行動 avi 動作 ade 這是一篇論文，原地址在： https://arxiv.org/abs/1312.5602 我屬於邊看便翻譯，邊理解，將他們記錄在這裏： Abstract：　　我們提出了第一個

CS294-112 深度強化學習秋季學期（伯克利）NO.19 Guest lecture: Igor Mordatch (Optimization and Reinforcement Learning in Multi-Agent Settings)

nbsp setting TP for agent image learn ctu Go

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

We'd Like To See Deep Learning Hand To Hand With Deep Feeling

Ask HN: Which can be the best way to communicate with the user?

Learning to Communicate

Configuring Cognito User Pools to Communicate with AWS IoT Core

對於DAN方法的解讀-Learning Transferable Features with Deep Adaptation Networks

Learning Structured Representation for Text Classification via Reinforcement Learning 學習筆記

《Learning to Compare: Relation Network for Few-Shot Learning》論文閱讀

Playing Atari with Deep Reinforcement Learning

CS294-112 深度強化學習秋季學期（伯克利）NO.19 Guest lecture: Igor Mordatch (Optimization and Reinforcement Learning in Multi-Agent Settings)

解讀continuous control with deep reinforcement learning（DDPG）

Fear the REAPER A System for Automatic Multi-Document Summarization with Reinforcement Learning

Deep Reinforcement Learning with Double Q-learning

Playing Atari with Deep Reinforcement Learning論文解讀

論文筆記5：How to Discount Deep Reinforcement Learning:Towards New Dynamic Strategies

Deep Reinforcement Learning Variants ofMulti-Agent Learning Algorithms

What’s New in Deep Learning Research: How Google Uses Reinforcement Learning to Ask All the Right…

How to Automate Surveillance Easily with Deep Learning

How to do Deep Learning on Graphs with Graph Convolutional Networks

計算機視覺之目標跟蹤——論文Learning to Track at 100 FPS with Deep Regression Networks

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

相關推薦