安裝NVIDIA-DOCKER

nvidia-gpu-docker
一、宿主機安裝Cuda
1.1 nouveau 驅動問題
nouveau 是系統自帶的一個顯示驅動程式,需要先將其禁用,然後再進行下一步操作,否則在安裝顯示卡驅動時,會提示:You appear to be running an X server …,然後安裝失敗。分別開啟如下兩個檔案(如果沒有就建立一個),並在其中輸入如下兩句,然後儲存。
vim /etc/modprobe.d/nvidia-installer-disable-nouveau.conf vim /lib/modprobe.d/nvidia-installer-disable-nouveau.conf blacklist nouveau options nouveau modeset=0
1.2 GCC問題
太容易忽略
1.3Kernel問題
# 資料 https://unix.stackexchange.com/questions/115289/driver-install-kernel-source-not-found yum -y install kernel-devel kernel-header
二、Docker 安裝
2.1 Ubuntu安裝
# 直接這麼安裝,別折騰了。 apt install docker
2.2 Centos安裝
# 別折騰了 哥 yum install docker
三、NVIDIA-DOCKER
3.1 Ubuntu 14.04/16.04/18.04, Debian Jessie/Stretch
Ubuntu will install docker.io
by default which isn't the latest version of Docker Engine. This implies that you will need to pin the version of nvidia-docker. ofollow,noindex">See more information here .
# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f sudo apt-get purge -y nvidia-docker # Add the package repositories curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \ sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update # Install nvidia-docker2 and reload the Docker daemon configuration sudo apt-get install -y nvidia-docker2 sudo pkill -SIGHUP dockerd # Test nvidia-smi with the latest official CUDA image docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
3.2 CentOS 7 (docker-ce), RHEL 7.4/7.5 (docker-ce), Amazon Linux 1/2
If you are not using the official docker-ce
package on CentOS/RHEL, use the next section.
# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f sudo yum remove nvidia-docker # Add the package repositories distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \ sudo tee /etc/yum.repos.d/nvidia-docker.repo # Install nvidia-docker2 and reload the Docker daemon configuration sudo yum install -y nvidia-docker2 sudo pkill -SIGHUP dockerd # Test nvidia-smi with the latest official CUDA image docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
If yum
reports a conflict on /etc/docker/daemon.json
with the docker
package, you need to use the next section instead.
For docker-ce on ppc64le
, look at the FAQ .
3.3 CentOS 7 (docker), RHEL 7.4/7.5 (docker)
# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f sudo yum remove nvidia-docker # Add the package repositories distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.repo | \ sudo tee /etc/yum.repos.d/nvidia-container-runtime.repo # Install the nvidia runtime hook sudo yum install -y nvidia-container-runtime-hook # Test nvidia-smi with the latest official CUDA image # You can't use `--runtime=nvidia` with this setup. docker run --rm nvidia/cuda:9.0-base nvidia-smi
3.4 Docker與當前Nvidia-docker版本不一致
查詢可安裝的nvidia docker版本
yum search --showduplicates nvidia-docker
最終輸出結果是下面這張圖:
查詢可安裝的 nvidia docker
版本
yum search --showduplicates nvidia-docker1
最終輸出結果是下面這張圖:

這裡寫圖片描述
大家可以從中選擇自己需要安裝的 nvidia docker
版本,這裡我安裝的是 docker
是 1.12.6
版本的。因此我選擇安裝倒數第一個版本的 nvidia docker
。
可以去參考資料1去看部落格.