1. 程式人生 > >TensorFlow GPU版安裝

TensorFlow GPU版安裝

0x00 前言

CPU版的TensorFlow安裝還是十分簡單的,也就是幾條命令的時,但是GPU版的安裝起來就會有不少的坑。在這裡總結一下整個安裝步驟,以及在安裝過程中遇到的問題和解決方法。

整體梳理

安裝GPU版的TensorFlow和CPU版稍微有一些區別,這裡先做一個簡單的梳理,後面有詳細的安裝過程。

  1. Python
  2. NVIDIA Cuda
  3. cuDNN
  4. TensorFlow
  5. 測試

0x01 安裝Python

這裡有兩種安裝的方法:

  • 安裝基本的Python環境,需要什麼再繼續安裝。
  • 安裝Anaconda,基本上能用到的包都有包含。

筆者都是直接安裝Anaconda了,省事,直接在官網下載安裝即可,沒什麼寫的。

下面給出安裝基本Python環境的命令。

apt-get install python-pip python-dev python-virtualenv 

0x02 安裝NVIDIA Cuda

安裝Cuda主要有下面幾個步驟:

  • 確認電腦的顯示卡支援cuda
  • 確認Linux版本是否支援cuda
  • 確認gcc是否安裝
  • 確認kernel版本
  • 禁用開源驅動
  • 關閉x server
  • 下載cuda
  • 安裝cuda

前面幾個步驟,主要是做各種前置條件驗證的。本文的操作基於Ubuntu16.04,而且電腦本來就是雙顯示卡,基本沒什麼問題。

主要的坑在安裝cuda

1. 驗證安裝環境

友情提示:

這幾步筆者基本上就跳過了,基本上沒什麼問題,感興趣或者對自己系統不瞭解的可以驗證一下。

  • 確認電腦的顯示卡支援cuda
  • 確認Linux版本是否支援cuda
  • 確認gcc是否安裝
  • 確認kernel版本
2.1. Verify You Have a CUDA-Capable GPU
To verify that your GPU is CUDA-capable, go to your distribution's equivalent of System Properties, or, from the command line, enter:

$ lspci | grep -i nvidia
If you do not
see any settings, update the PCI hardware database that Linux maintains by entering update-pciids (generally found in /sbin) at the command line and rerun the previous lspci command. If your graphics card is from NVIDIA and it is listed in http://developer.nvidia.com/cuda-gpus, your GPU is CUDA-capable. The Release Notes for the CUDA Toolkit also contain a list of supported products. 2.2. Verify You Have a Supported Version of Linux The CUDA Development Tools are only supported on some specific distributions of Linux. These are listed in the CUDA Toolkit release notes. To determine which distribution and release number you're running, type the following at the command line: $ uname -m && cat /etc/*release You should see output similar to the following, modified for your particular system: x86_64 Red Hat Enterprise Linux Workstation release 6.0 (Santiago) The x86_64 line indicates you are running on a 64-bit system. The remainder gives information about your distribution. 2.3. Verify the System Has gcc Installed The gcc compiler is required for development using the CUDA Toolkit. It is not required for running CUDA applications. It is generally installed as part of the Linux installation, and in most cases the version of gcc installed with a supported version of Linux will work correctly. To verify the version of gcc installed on your system, type the following on the command line: $ gcc --version If an error message displays, you need to install the development tools from your Linux distribution or obtain a version of gcc and its accompanying toolchain from the Web. 2.4. Verify the System has the Correct Kernel Headers and Development Packages Installed The CUDA Driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 3.17.4-301, the 3.17.4-301 kernel headers and development packages must also be installed. While the Runfile installation performs no package validation, the RPM and Deb installations of the driver will make an attempt to install the kernel header and development packages if no version of these packages is currently installed. However, it will install the latest version of these packages, which may or may not match the version of the kernel your system is using. Therefore, it is best to manually ensure the correct version of the kernel headers and development packages are installed prior to installing the CUDA Drivers, as well as whenever you change the kernel version. The version of the kernel your system is running can be found by running the following command: $ uname -r This is the version of the kernel headers and development packages that must be installed prior to installing the CUDA Drivers. This command will be used multiple times below to specify the version of the packages to install. Note that below are the common-case scenarios for kernel usage. More advanced cases, such as custom kernel branches, should ensure that their kernel headers and sources match the kernel build they are running. RHEL/CentOS The kernel headers and development packages for the currently running kernel can be installed with: $ sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r) Fedora The kernel headers and development packages for the currently running kernel can be installed with: $ sudo dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r) OpenSUSE/SLES Use the output of the uname command to determine the running kernel's version and variant: $ uname -r 3.16.6-2-default In this example, the version is 3.16.6-2 and the variant is default. The kernel headers and development packages can then be installed with the following command, replacing <variant> and <version> with the variant and version discovered from the previous uname command: $ sudo zypper install kernel-<variant>-devel=<version> Ubuntu The kernel headers and development packages for the currently running kernel can be installed with: $ sudo apt-get install linux-headers-$(uname -r) Read more at: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ixzz4b1TabLmw Follow us: @GPUComputing on Twitter | NVIDIA on Facebook

2. 禁用開源驅動

注意:有些教程會有更多需要禁用的驅動,這裡遵循官網的說明,沒有問題。

新建一個檔案

vim /etc/modprobe.d/blacklist-nouveau.conf

檔案內容為

blacklist nouveau
options nouveau modeset=0

update一下

sudo update-initramfs -u

3. 關閉X server

當我們安裝 NVIDIA 的驅動程式時,需要先關閉 X server,關閉的方式有兩種:

  • 關閉gdm
  • 關閉lightdm

第一種不行時,嘗試用第二種。 本文使用的是第二種關閉方式。

方法1
sudo /etc/init.d/gdm stop
sudo /etc/init.d/gdm status
方法2
sudo /etc/init.d/lightdm stop
sudo /etc/init.d/lightdm status

注意:後面顯示卡驅動程式安裝完畢後,應首先重啟 gdm 或者 lightdm。

4. 下載cuda

注意:這裡在選擇下載檔案時選擇的是.run字尾的檔案。用別的遇到了一些坑,最後發現這個最穩定。

5. 安裝cuda

注意:安裝過程中有坑,請重視下面的說明,否則安裝後的ubuntu可能會出現無法進入圖形介面的情況。因為這個原因,筆者重做了一晚上的作業系統,嘗試了3個Ubuntu的版本和兩個Centos的版本。

其它的步驟,跟著說明點就行,主要注意圖中框的地方。

安裝cuda時一定不要安裝OpenGL;切記,否則有可能在安裝完之後無法啟動圖形化桌面。

顯示卡驅動程式安裝完畢後,應首先重啟 gdm 或者 lightdm。

6. 新增環境變數

在官網裡面需要配置環境變數。

在terminal根目錄中輸入以下命令:
$ sudo vim ~/.bash_profile

在開啟的文字末尾加入:
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda

0x03 安裝cuDNN

下面是下載地址,需要提前註冊。 註冊一下就好。

注意下載的版本。

下載下來的檔案就deb包,直接dpkg -i 安裝即可。

0x04 安裝TensorFlow

Tensorflow官網中有很詳細的說明。筆者建議無論是Anaconda還是原生Python環境,都使用pip安裝,最為簡便,版本也很新。

pip install tensorflow

另外,如果使用Anaconda的conda安裝,有一個好處就是可以為Tensorflow單獨建一個虛擬環境,但要注意輸入正確的Tensorflow包地址(gpu還是cpu版本、作業系統、Python版本等)。
https://storage.googleapis.com/tensorflow/裡有所有Tensorflow包的列表(XML格式)。

建一個虛擬環境
$ conda create -n tensorflow

Activate虛擬環境
$ source activate tensorflow
 (tensorflow)$  # Your prompt should change 

安裝tensorflow
 (tensorflow)$ pip install --ignore-installed --upgrade \
 https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-1.0.1-cp27-cp27m-linux_x86_64.whl

0x05 驗證安裝

這裡跑一個小例子來驗證一下。

$ python
Then, enter the following short program inside the python interactive shell:

>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print(sess.run(hello))
If the system outputs the following, then you are ready to begin running TensorFlow programs:

Hello, TensorFlow!