1. 程式人生 > >來自官方文件的Ubuntu 16.04 + tensorflow-GPU 配置

來自官方文件的Ubuntu 16.04 + tensorflow-GPU 配置

I  Preprare for CUDA installation

本節是一些準備工作,檢視作業系統版本號、GPU型號等。

To verify that your GPU is CUDA-capable, go to your distribution's equivalent of System Properties, or, from the command line, enter:

$ lspci | grep -i nvidia

cuda 目前支援的GPU版本型號和大類包括:https://developer.nvidia.com/cuda-gpus

The CUDA Development Tools are only supported on some specific distributions of Linux. These are listed in the CUDA Toolkit release notes. To determine which distribution and release number you're running, type the following at the command line:

$ uname -m && cat /etc/*release

The gcc compiler is required for development using the CUDA Toolkit. gcc 是GNU編譯器套裝(英語:GNU Compiler Collection,縮寫為GCC),指一套程式語言編譯器. 編譯器版本可處理多種語言:比如Java,Ada, C, C++等等. It is not required for running CUDA applications. It is generally installed as part of the Linux installation, and in most cases the version of gcc installed with a supported version of Linux will work correctly. To verify the version of gcc installed on your system, type the following on the command line:

$ gcc --version

The CUDA Driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 3.17.4-301, the 3.17.4-301 kernel headers and development packages must also be installed.

While the Runfile installation performs no package validation, the RPM and Deb installations of the driver will make an attempt to install the kernel header and development packages if no version of these packages is currently installed. However, it will install the latest version of these packages, which may or may not match the version of the kernel your system is using. Therefore, it is best to manually ensure the correct version of the kernel headers and development packages are installed prior to installing the CUDA Drivers, as well as whenever you change the kernel version.

The version of the kernel your system is running can be found by running the following command:

手動檢視kernel版本

$ uname -r

The kernel headers and development packages for the currently running kernel can be installed with:

安裝與系統kernel版本對應的headers 和development packages.

$ sudo apt-get install linux-headers-$(uname -r)

II. Download CUDA toolkit 8.0 and Installation

(注意:目前tensorflow 1.3 只支援CUDA toolkit 8.0+cudnn 6.0 )

建議讀者在安裝時,請check 實時的tensorflow官網上支援的CUDA 版本 以及cudnn版本,否則裝了最新版本,不被tensorflow支援,還得解除安裝重新來過。

  • CUDA® Toolkit 8.0. For details, see NVIDIA's documentation. Ensure that you append the relevant Cuda pathnames to the LD_LIBRARY_PATH environment variable as described in the NVIDIA documentation.
  • The NVIDIA drivers associated with CUDA Toolkit 8.0.
  • cuDNN v6.0. For details, see NVIDIA's documentation. Ensure that you create the CUDA_HOMEenvironment variable as described in the NVIDIA documentation.

2.1 Download  cuda toolkit 下載cuda toolkit,注意下載cuda 8.0

選擇 Linux> x86_64> ubuntu> 16.04> deb(local)

2.2 install cuda toolkit 8.0 安裝

在terminal 視窗依次輸入以下Installation Instructions

cd命令進入到下載檔案的資料夾,然後輸入以下命令,安裝cuda
  1. `$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb`
  2. `$ sudo apt-get update`
  3. `$ sudo apt-get install cuda`

********如果上述命令為你安裝的不是cude-8-0而是新版cuda-9-0等,解決方案如下**********

因為我之前安裝過高版本的cuda-9.1,發現tensorflow不支援,因此解除安裝並請清除過cuda-9.1。用上面三句話重新安裝cuda最後還是會自動安裝cuda-9.0而不是我希望的cuda-8。

歸納如下:

先解除安裝已經安裝的高版本的cuda9.1

$ sudo apt-get --purge remove cuda

$ sudo apt autoremove

然後清理apt-cache

$ sudo apt-get clean

最後重新安裝,並且cuda的指定版本號

$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb

$ sudo apt-get update

$ sudo apt-get install cuda-8-0

順利完成!

******************************************

2.3 environment setup 配置環境變數

開啟\home目錄下的.bashrc 檔案(這是隱藏檔案,因此需要先用ctrl+H 快捷鍵顯示隱藏檔案再開啟),在.bashrc的最後追加如下語句:

export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}

export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# 注意這裡要路徑要和Nvida驅動版本一致  在終端輸入 $cat /proc/driver/nvidia/version 可以檢視驅動版本號

export LPATH=/usr/lib/nvidia-387:$LPATH

export LIBRARY_PATH=/usr/lib/nvidia-387:$LIBRARY_PATH

注意:上述語句中除了export後面的空格,不要有不必要的空格,否則會不識別,是空格敏感的

2.4 Test cuda是否安裝成功, 檢視nvcc編譯器的版本

$ nvcc -V

III. install cudnn  (深度神經網路庫 Deep Neural Network library) 

3.1 download cudnn (注意下載cudnn 6.0)

讀者別嫌麻煩,註冊加入(join)一下,然後就可以免費下載,下載時注意選擇與本機ubuntu版本,cuda版本號對應的cudnn 6.0

3.2 install cudnn 

  • Navigate to your <cudnnpath> directory containing cuDNN Debian file. cd命令進入到下載這三個檔案的目錄,然後依次安裝
$ sudo dpkg -i libcudnn6_6.0.3.11-1+cuda8.0_amd64.deb
  • Install the developer library, for example:
$ sudo dpkg -i libcudnn6-dev_6.0.3.11-1+cuda8.0_amd64.deb
  • Install the code samples and the cuDNN Library User Guide, for example:
$ sudo dpkg -i libcudnn6-doc_6.0.3.11-1+cuda8.0_amd64.deb

這裡的sudo dpkg -i 後面的 ‘ libcudnn6-...’  版本號 以自己下載檔案的命名為準。

小結:cuDNN is just installed by dropping files onto your system, 不用配置環境變數.

IV. install Tensorflow-gpu

4.1 prepare

The libcupti-dev library, which is the NVIDIA CUDA Profile Tools Interface. This library provides advanced profiling support. To install this library, issue the following command:

sudo apt-get install libcupti-dev

4.2  用native pip命令安裝 tensorflow-gup

sudo apt-get install python3-pip python3-dev # for Python 3.n

pip3 install tensorflow-gpu # Python 3.n; GPU support 

(Optional.) If above step ‘$ pip3 install tensor flow-gpu’ failed, install the latest version of TensorFlow by issuing a command of the following format:

sudo pip3 install --upgrade tfBinaryURL   # Python 3.n 

where tfBinaryURL identifies the URL of the TensorFlow Python package. The appropriate value oftfBinaryURL depends on the operating system, Python version, and GPU support. Find the appropriate value for tfBinaryURL here. For example, to install TensorFlow for Linux, Python 3.4, and CPU-only support, issue the following command:

4.3 類似2.3節提到的環境變數配置,在.bashrc文件中再追加環境變數

# Tensorflow 要求的環境變數

export CUDA_HOME=/usr/local/cuda-8.0

4.4. Test tensorflow-gpu 是否配置成功, 跑一段程式碼

$ python3

# 進入Python 環境下

>>> import tensorflow as tf

>>> hello =tf.constant("hello, tensorflow")

>>> sess = tf.Session() >>> print(sess.run(hello))

輸出了"hello, tensorflow" ,執行成功,恭喜你。

附錄:遇到過的錯誤及解決方案

1. 我一切都安裝好了,但是執行時報錯,cannot load nativeruntime tensorflow: 

Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 23, in <module>     from tensorflow.python import *   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 49, in <module>     from tensorflow.python import pywrap_tensorflow   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>     _pywrap_tensorflow = swig_import_helper()   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper     _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)

ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory

錯誤原因:I installed Cuda 9.0, but I realized that tensorflow 1.3 does not yet support it.

方法:

# I did following steps to remove cuda 9.0

$ sudo apt-get --purge remove cuda

$ sudo apt autoremove

# Then clear apt-cache

$ sudo apt-get clean

# Then I tried following steps to reinstall the cuda 8.0

$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb

$ sudo apt-get update

$ sudo apt-get install cuda

再次遇到問題: I have tried uninstalling cuda v9.0 but when I try to uninstall v8.0, v9.0 keeps getting installed instead. However cuda 9.0 keeps getting installed instead. How do I prevent this from happening and install 8.0?

Nvidia ansuwer: 再解除安裝一遍,安裝時上述三句話的最後一句指定cuda版本號

$ sudo apt-get install cuda-8-0

其他參考:

https://segmentfault.com/a/1190000008234390