強化學習 — mujoco、mujoco_py、gym 和 baselines的環境配置

阿新 • • 發佈：2019-01-20

和其它的機器學習方向一樣，強化學習（Reinforcement Learning）也有一些經典的實驗場景，像Mountain-Car，Cart-Pole等。由於近年來深度強化學習（Deep Reinforcement Learning）的興起，各種新的更復雜的實驗場景也在不斷湧現。於是出現了OpenAI Gym，MuJoCo，rllab, DeepMind Lab, TORCS, PySC2等一系列優秀的平臺。

博主環境
Ubuntu16.04
Anaconda2
python 3.６（建議重新在anaconda中建立新的環境，以下操作均在conda建立環境下配置）
tensorflow-gpu 1.4.1 （baseline 最低要求1.4.1）
CUDA 8.0 (CUDA的安裝可參考https://blog.csdn.net/Hansry/article/details/81008210)
Cudnn 6.0

1.安裝mujoco

MuJoCo（Multi-Joint dynamics with Contact）是一個物理模擬器，可以用於機器人控制優化等研究。
1.準備工作
在官網上下載 mjpro150 linux ，同時點選Licence下載許可證，需要full name email address computer id 等資訊，其中根據使用平臺下載 getid_linux（可執行檔案） 獲取 computer id, 步驟如下：

$ chmod a+x getid_linux (給予執行許可權)
$ ./getid_linux

輸出結果類似於 LINUX_A1EHAO_Q8BPHTIM10F05D0S3TB3293

點選submint 後，從輸入的郵箱中下載證書mjkey.txt

2.環境配置
2.1 建立隱藏資料夾並將 mjpro150_linux 拷貝到 mujoco 資料夾中

mkdir ~/.mujoco    
cp mjpro150_linux.zip ~/.mujoco
cd ~/.mujoco
unzip mjpro150_linux.zip

2.2 將證書mjkey.txt拷貝到建立的隱藏資料夾中

cp mjkey.txt ~/.mujoco  
cp mjkey.txt ~/.mujoco/mjpro150/bin

2.3.新增環境變數, 開啟～/.bashrc 檔案,將以下命令新增進去

export LD_LIBRARY_PATH=~/.mujoco/mjpro150/bin${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export MUJOCO_KEY_PATH=~/.mujoco${MUJOCO_KEY_PATH}

3.執行結果

cd ~/.mujoco/mjpro150/bin
./simulate ../model/humanoid.xml

2.安裝mujoco_py

首先現在官網上下載安裝　mujoco_py原始碼, 注意的是在這裡安裝的時候可能會缺很多包，但是提示什麼裝什麼就行了。

 pip3 install -U 'mujoco-py<1.50.2,>=1.50.1'

如下順利執行則說明安裝成功

>>> import mujoco_py
>>> from os.path import dirname
>>> model = mujoco_py.load_model_from_path(dirname(dirname(mujoco_py.__file__))  +"/xmls/claw.xml")
>>> sim = mujoco_py.MjSim(model)
>>> print(sim.data.qpos)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
>>> sim.step()
>>> print(sim.data.qpos)
[ 2.09217903e-06 -1.82329050e-12 -1.16711384e-07 -4.69613872e-11
 -1.43931860e-05  4.73350204e-10 -3.23749942e-05 -1.19854057e-13
 -2.39251380e-08 -4.46750545e-07  1.78771599e-09 -1.04232280e-08]

一定會出現的問題為

Ｃｒｅating window glfw
ERROR: GLEW initialization error: Missing GL version

Press Enter to exit ...

將export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-390 加入到~/.bashrc 中解決問題

執行python body_interaction.py

３.安裝gym

OpenAI Gym是OpenAI出的研究強化學習演算法的toolkit，它裡邊cover的場景非常多，從經典的Cart-Pole, Mountain-Car到Atar，Go，MuJoCo都有。官方網站為https://gym.openai.com/，原始碼位於https://github.com/openai/gym，它的readme提供了安裝和執行示例，按其中的安裝方法：
最小安裝

git clone https://github.com/openai/gym.git
cd gym
pip install -e .

完全安裝

apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig Pillow  libglfw3-dev
pip install -e '.[all]'

進入~/gym/examples/scripts中，通過./list_envs可發現gym中擁有很多環境。
通過呼叫gym中的環境，如下程式碼所示，可以執行

import gym
env = gym.make('Hero-ram-v0')
for i_episode in range(20):
    observation = env.reset()
    for t in range(100):
        env.render()
        print(observation)
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break

import gym
env = gym.make('Hero-ram-v0')
for i_episode in range(20):
    observation = env.reset()
    for t in range(100):
        env.render()
        print(observation)
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break

4.安裝baseline

OpenAI Baseline是一系列高質量的強化學習控制演算法，需要python>=3.5, 且需要ＯpenMPI和zlib,有些ｂａｓｅｌｉｎｅ　ｅｘａｍｐｌｅ　是基於Mujoco　物理模擬環境的，可以根據上面的教程進行安裝。

4.1 下載ｂａｓｅｌｉｎｅｓ包並且配置相關工具

git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .

4.2 安裝 tensorflow-gpu，由於我們的CUDA 和Cudnn 分別是8.0 和6.0,　所以我們需要安裝對應的tensorflow-gpu

conda install tensorflow-gpu=1.4.1

4.3 為了在baseline 中對演算法進行測試，需要安裝pytest

pip install pytest
pytest

輸出以下內容即配置成功

================================================================================== test session starts ==================================================================================
platform linux -- Python 3.6.6, pytest-3.6.3, py-1.5.4, pluggy-0.6.0
rootdir: /home/hansry/append/RL/baselines, inifile:
collected 12 items                                                                                                                                                                      

baselines/common/test_identity.py

其他包都是缺什麼補什麼

5.baselines 中HER(Hindsight experience replay)的使用

進入到/baselines/baselines/her/experiment資料夾下，發現含有ｃｏｎｆｉｇ.py play.py train.py等檔案

其中config.py一般是設定其引數，諸如下面

DEFAULT_PARAMS = {
    # env
    'max_u': 1.,  # max absolute value of actions on different coordinates
    # ddpg
    'layers': 3,  # number of layers in the critic/actor networks
    'hidden': 256,  # number of neurons in each hidden layers
    'network_class': 'baselines.her.actor_critic:ActorCritic',
    'Q_lr': 0.001,  # critic learning rate
    'pi_lr': 0.001,  # actor learning rate
    'buffer_size': int(1E6),  # for experience replay
    'polyak': 0.95,  # polyak averaging coefficient
    'action_l2': 1.0,  # quadratic penalty on actions (before rescaling by max_u)
    'clip_obs': 200.,
    'scope': 'ddpg',  # can be tweaked for testing
    'relative_goals': False,
    # training
    'n_cycles': 50,  # per epoch
    'rollout_batch_size': 2,  # per mpi thread
    'n_batches': 40,  # training batches per cycle
    'batch_size': 256,  # per mpi thread, measured in transitions and reduced to even multiple of chunk_length.
    'n_test_rollouts': 10,  # number of test rollouts per epoch, each consists of rollout_batch_size rollouts
    'test_with_polyak': False,  # run test episodes with the target network
    # exploration
    'random_eps': 0.3,  # percentage of time a random action is taken
    'noise_eps': 0.2,  # std of gaussian noise added to not-completely-random actions as a percentage of max_u
    # HER
    'replay_strategy': 'future',  # supported modes: future, none
    'replay_k': 4,  # number of additional goals used for replay, only used if off_policy_data=future
    # normalization
    'norm_eps': 0.01,  # epsilon used for observation normalization
    'norm_clip': 5,  # normalized observations are cropped to this values
}

其中ｔｒａｉｎ.py為訓練DDPG+HER中的引數，諸如神經網路中的引數等，在訓練引數過程中，還有許多可選選項

@click.option('--env', type=str, default='FetchReach-v1', help='the name of the OpenAI Gym environment that you want to train on')　（選擇ＨＥＲ執行的任務）
@click.option('--logdir', type=str, default=None, help='the path to where logs and policy pickles should go. If not specified, creates a folder in /tmp/')　（選擇訓練完policy的引數）
@click.option('--n_epochs', type=int, default=50, help='the number of training epochs to run')　（迭代的次數）
@click.option('--num_cpu', type=int, default=1, help='the number of CPU cores to use (using MPI)')　（使用多少個ＣＰＵ）
@click.option('--seed', type=int, default=0, help='the random seed used to seed both the environment and the training code')
@click.option('--policy_save_interval', type=int, default=5, help='the interval with which policy pickles are saved. If set to 0, only the best and latest policy will be pickled.')
@click.option('--replay_strategy', type=click.Choice(['future', 'none']), default='future', help='the HER replay strategy to be used. "future" uses HER, "none" disables HER.')
@click.option('--clip_return', type=int, default=1, help='whether or not returns should be clipped')

訓練的命令是python train.py --num_cpu=2 (引數可選)

其中play.py為呼叫訓練好的引數進行執行，執行命令如下：

python play.py policy_best.pkl(後面需要跟著訓練好的引數檔案)

強化學習 — mujoco、mujoco_py、gym 和 baselines的環境配置

1.安裝mujoco

2.安裝mujoco_py

３.安裝gym

4.安裝baseline

5.baselines 中HER(Hindsight experience replay)的使用

強化學習 — mujoco、mujoco_py、gym 和 baselines的環境配置

#######haohaohao#######對抗思想與強化學習的碰撞-SeqGAN模型原理和程式碼解析

強化學習系列3：Open AI的baselines和Spinning Up

linux、unix下檢視和新增環境變數

16、idea的注入和自動編譯配置

二、阿里雲CentOS7的Java環境配置

liunx學習筆記之打包，壓縮和yum源配置

【webpack學習筆記】a06-生產環境和開發環境配置

Java環境變數設定(適用於Windows 2000、XP、2003)及Windows JSP執行環境配置

基於IMOOC強力django+殺手級xadmin 打造上線標準的線上教育平臺課程的學習（16）——首頁和登入頁面配置

MxNet C++和python環境配置

Python 安裝和 Pycharm 環境配置

Django的模組匯入環境和管理員環境配置

Django的模塊導入環境和管理員環境配置

mysql資料庫和MySQLdb環境配置

React學習（一）——基礎專案搭建以及環境配置

opencv3.4和vs2017環境配置

Ubuntu下emacs環境和LAMP環境配置

Android驅動程式開發和除錯環境配置

ElasticSearch學習 - （二）Node.js安裝及環境配置之Windows篇

強化學習 — mujoco、mujoco_py、gym 和 baselines的環境配置

1.安裝mujoco

2.安裝mujoco_py

３.安裝gym

4.安裝baseline

5.baselines 中HER(Hindsight experience replay)的使用

相關推薦