Mastering the game of Go with deep neural networks and tree search

阿新 • • 發佈：2017-06-12

深度策略參數初始化技術以及 -1 簡單 cpu 網絡

Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.7587 (2016): 484-489.

Alphago的論文，主要使用了RL的技術，不知道之前有沒有用RL做圍棋的。

提出了兩個網絡，一個是策略網絡，一個是價值網絡，均是通過自我對戰實現。

策略網絡：

技術分享

策略網絡就是給定當前棋盤和歷史信息，給出下一步每個位置的概率。以前的人似乎是用棋手下的棋做有監督訓練，這裏用RL代替，似乎效果比有監督訓練要好。策略網絡的參數初始化是用有監督訓練網絡的參數。

價值網絡：

技術分享

價值網絡就是給定當前棋盤和歷史信息，給出對己方的優勢概率。本來是用來代替蒙特卡洛的隨機模擬估計的，但是發現把價值網絡和隨機模擬估計結合起來效果比較好。個人覺得要是價值網絡如果訓練得足夠好，說不定也就不需要模擬估計了。當然這裏的模擬也不是完全隨機，好像是用的一個有監督訓練出來的淺層網絡進行模擬下棋。

策略網絡可以降低蒙特卡洛搜索樹的寬度，價值網絡減小其深度。

該論文第一次打敗了人類職業選手（五段的Fan Hui）

另外，該方法有分布式版本和單機版，官方給單機版的判斷是和Fan Hui一個水平，分布式版本的可以達到職業5段以上水平。分布式版本用了40個搜索線程, 1,202 個CPU以及176個GPU。單機版是40個搜索線程，48個CPU和8個GPU。按照這個配置，應該10年之內，單臺筆記本電腦能跑個職業3段以上的圍棋程序，這對圍棋學習者是個很好的消息。

Alphgo讓RL火了，讓圍棋火了，讓柯潔火了，威力還是巨大的。圍棋比較容易形式化，規則也比較簡單，只是搜索空間有點大，但現實中還有很多問題規則復雜，信息不完全，狀態空間大，決策空間大，需要聯合決策等。Alphago還在不斷發展，後續應該還有論文。

Mastering the game of Go with deep neural networks and tree search

深度策略參數初始化技術以及 -1 簡單 cpu 網絡 Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.758

Mastering the game of Go with deep neural networks and tree search

Mastering the game of Go with deep neural networks and tree search

Mastering the game of Go with deep neural networks and tree search譯文

AlphaGo論文的譯文，用深度神經網路和樹搜尋征服圍棋：Mastering the game of Go with deep neural networks and tree search

《mastering the game of GO wtth deep neural networks and tree search》研究解讀

論文翻譯：Mastering the Game of Go without Human Knowledge (第一部分)

Mastering the game of Go without human knowledge

How Spektacom is Powering the Game of Cricket with Microsoft AI | Machine Learning Blog

【醫學影像】《Dermatologist-level classification of skin cancer with deep neural networks》論文筆記

論文筆記 / Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks

Ranking with Recursive Neural Networks and Its Application to Multi-document Summarization

Deep Learning 16：用自編碼器對資料進行降維_讀論文“Reducing the Dimensionality of Data with Neural Networks”的筆記

Deep Learning讀書筆記（一）：Reducing the Dimensionality of Data with Neural Networks

14.On the Decision Boundary of Deep Neural Networks

[譯]深度神經網絡的多任務學習概覽(An Overview of Multi-task Learning in Deep Neural Networks)

課程一(Neural Networks and Deep Learning)，第二週（Basics of Neural Network programming）—— 1、10個測驗題（Neural N

深度神經網路的多工學習概覽(An Overview of Multi-task Learning in Deep Neural Networks)

【題解】codeforces549C[AHSOFNU codeforces訓練賽1 by hzwer]A.The Game Of Parity 博弈論

An Overview of Multi-Task Learning in Deep Neural Networks

Bag of Tricks for Image Classification with Convolutional Neural Networks

Light OJ-1344 Aladdin and the Game of Bracelets DP(記憶化搜尋) + SG函式博弈

Mastering the game of Go with deep neural networks and tree search

相關推薦