1. 程式人生 > >深度學習開發環境配置第二彈:Ubuntu16.04+CUDA9.0.176上cuDNN+TensorRT配置

深度學習開發環境配置第二彈:Ubuntu16.04+CUDA9.0.176上cuDNN+TensorRT配置

一、安裝cuDNN

cuDNN下載地址:https://developer.nvidia.com/rdp/cudnn-download

參照cuDNN的官方installation guide進行安裝,選擇從編譯好的debian file進行安裝:

[email protected]:/media/vslyu/home/sinc-lab/Downloads$ sudo dpkg -i libcudnn7_7.1.3.16-1+cuda9.0_amd64.deb
Selecting previously unselected package libcudnn7.
(Reading database ... 189338 files and directories currently installed.)
Preparing to unpack libcudnn7_7.1.3.16-1+cuda9.0_amd64.deb ...
Unpacking libcudnn7 (7.1.3.16-1+cuda9.0) ...
Setting up libcudnn7 (7.1.3.16-1+cuda9.0) ...
Processing triggers for libc-bin (2.23-0ubuntu10) ...
[email protected]
:/media/vslyu/home/sinc-lab/Downloads$ sudo dpkg -i libcudnn7-dev_7.1.3.16-1+cuda9.0_amd64.deb Selecting previously unselected package libcudnn7-dev. (Reading database ... 189345 files and directories currently installed.) Preparing to unpack libcudnn7-dev_7.1.3.16-1+cuda9.0_amd64.deb ... Unpacking libcudnn7-dev (7.1.3.16-1+cuda9.0) ... Setting up libcudnn7-dev (7.1.3.16-1+cuda9.0) ... update-alternatives: using /usr/include/x86_64-linux-gnu/cudnn_v7.h to provide /usr/include/cudnn.h (libcudnn) in auto mode
[email protected]
:/media/vslyu/home/sinc-lab/Downloads$ sudo dpkg -i libcudnn7-doc_7.1.3.16-1+cuda9.0_amd64.deb Selecting previously unselected package libcudnn7-doc. (Reading database ... 189351 files and directories currently installed.) Preparing to unpack libcudnn7-doc_7.1.3.16-1+cuda9.0_amd64.deb ... Unpacking libcudnn7-doc (7.1.3.16-1+cuda9.0) ... Setting up libcudnn7-doc (7.1.3.16-1+cuda9.0) ...
[email protected]
:/media/vslyu/home/sinc-lab/Downloads$

編譯cudnn自帶例程測試一下安裝的結果:

[email protected]:~$ cp -r /usr/src/cudnn_samples_v7/ /home/sinc-lab/LYH/
[email protected]:~$ cd  ~/LYH/cudnn_samples_v7/mnistCUDNN
[email protected]:~/LYH/cudnn_samples_v7/mnistCUDNN$ make -j16
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o fp16_dev.o -c fp16_dev.cu
g++ -I/usr/local/cuda/include -IFreeImage/include   -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -IFreeImage/include   -o mnistCUDNN.o -c mnistCUDNN.cpp
/usr/local/cuda/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o  -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm
[email protected]:~/LYH/cudnn_samples_v7/mnistCUDNN$ ./mnistCUDNN
cudnnGetVersion() : 7103 , CUDNN_VERSION from cudnn.h : 7103 (7.1.3)
Host compiler version : GCC 5.4.0
There are 4 CUDA capable devices on your machine :
device 0 : sms 28  Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11172, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=0
device 1 : sms 28  Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11172, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=1
device 2 : sms 28  Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11172, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=2
device 3 : sms 28  Capabilities 6.1, SmClock 1582.0 Mhz, MemSize (Mb) 11172, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=3
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.029696 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.034816 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.142336 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.178400 time requiring 203008 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.234400 time requiring 207360 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.037888 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.078848 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.089088 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.142176 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.153568 time requiring 203008 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!
[email protected]:~/LYH/cudnn_samples_v7/mnistCUDNN$

測試通過。

二、安裝TensorRT

根據NVIDIA的關於TensorRT installation官方文件進行安裝,選擇debian file的安裝方式進行安裝:

[email protected]:/media/vslyu/home/sinc-lab/Downloads$ sudo dpkg -i nv-tensorrt-repo-ubuntu1604-cuda9.0-ga-trt4.0.1.6-20180612_1-1_amd64.deb
Selecting previously unselected package nv-tensorrt-repo-ubuntu1604-cuda9.0-ga-trt4.0.1.6-20180612.
(Reading database ... 189403 files and directories currently installed.)
Preparing to unpack nv-tensorrt-repo-ubuntu1604-cuda9.0-ga-trt4.0.1.6-20180612_1-1_amd64.deb ...
Unpacking nv-tensorrt-repo-ubuntu1604-cuda9.0-ga-trt4.0.1.6-20180612 (1-1) ...
Setting up nv-tensorrt-repo-ubuntu1604-cuda9.0-ga-trt4.0.1.6-20180612 (1-1) ...
[email protected]:/media/vslyu/home/sinc-lab/Downloads$ sudo apt-get install tensorrt
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package tensorrt
[email protected]:/media/vslyu/home/sinc-lab/Downloads$ sudo apt-get update
Get:1 file:/var/cuda-repo-9-0-local  InRelease
Ign:1 file:/var/cuda-repo-9-0-local  InRelease
Get:2 file:/var/nv-tensorrt-repo-cuda9.0-ga-trt4.0.1.6-20180612  InRelease
Ign:2 file:/var/nv-tensorrt-repo-cuda9.0-ga-trt4.0.1.6-20180612  InRelease
Get:3 file:/var/cuda-repo-9-0-local  Release [574 B]
Hit:4 http://mirrors.hust.edu.cn/ubuntu xenial InRelease
Hit:5 http://mirrors.hust.edu.cn/ubuntu xenial-security InRelease
Get:3 file:/var/cuda-repo-9-0-local  Release [574 B]
Hit:6 http://mirrors.hust.edu.cn/ubuntu xenial-updates InRelease
Hit:7 http://mirrors.hust.edu.cn/ubuntu xenial-proposed InRelease
Hit:8 http://mirrors.hust.edu.cn/ubuntu xenial-backports InRelease
Get:9 file:/var/nv-tensorrt-repo-cuda9.0-ga-trt4.0.1.6-20180612  Release [574 B]
Get:9 file:/var/nv-tensorrt-repo-cuda9.0-ga-trt4.0.1.6-20180612  Release [574 B]
Get:10 file:/var/nv-tensorrt-repo-cuda9.0-ga-trt4.0.1.6-20180612  Release.gpg [819 B]
Get:10 file:/var/nv-tensorrt-repo-cuda9.0-ga-trt4.0.1.6-20180612  Release.gpg [819 B]
Get:12 file:/var/nv-tensorrt-repo-cuda9.0-ga-trt4.0.1.6-20180612  Packages [3,618 B]
Reading package lists... Done
[email protected]:/media/vslyu/home/sinc-lab/Downloads$ sudo apt-get install tensorrt
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  libnvinfer-dev libnvinfer-samples libnvinfer4
The following NEW packages will be installed:
  libnvinfer-dev libnvinfer-samples libnvinfer4 tensorrt
0 upgraded, 4 newly installed, 0 to remove and 26 not upgraded.
Need to get 0 B/346 MB of archives.
After this operation, 858 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 file:/var/nv-tensorrt-repo-cuda9.0-ga-trt4.0.1.6-20180612  libnvinfer4 4.1.2-1+cuda9.0 [36.1 MB]
Get:2 file:/var/nv-tensorrt-repo-cuda9.0-ga-trt4.0.1.6-20180612  libnvinfer-dev 4.1.2-1+cuda9.0 [37.4 MB]
Get:3 file:/var/nv-tensorrt-repo-cuda9.0-ga-trt4.0.1.6-20180612  libnvinfer-samples 4.1.2-1+cuda9.0 [271 MB]
Get:4 file:/var/nv-tensorrt-repo-cuda9.0-ga-trt4.0.1.6-20180612  tensorrt 4.0.1.6-1+cuda9.0 [1,509 kB]
Selecting previously unselected package libnvinfer4.
(Reading database ... 189426 files and directories currently installed.)
Preparing to unpack .../libnvinfer4_4.1.2-1+cuda9.0_amd64.deb ...
Unpacking libnvinfer4 (4.1.2-1+cuda9.0) ...
Selecting previously unselected package libnvinfer-dev.
Preparing to unpack .../libnvinfer-dev_4.1.2-1+cuda9.0_amd64.deb ...
Unpacking libnvinfer-dev (4.1.2-1+cuda9.0) ...
Selecting previously unselected package libnvinfer-samples.
Preparing to unpack .../libnvinfer-samples_4.1.2-1+cuda9.0_amd64.deb ...
Unpacking libnvinfer-samples (4.1.2-1+cuda9.0) ...
Selecting previously unselected package tensorrt.
Preparing to unpack .../tensorrt_4.0.1.6-1+cuda9.0_amd64.deb ...
Unpacking tensorrt (4.0.1.6-1+cuda9.0) ...
Processing triggers for libc-bin (2.23-0ubuntu10) ...
Setting up libnvinfer4 (4.1.2-1+cuda9.0) ...
Setting up libnvinfer-dev (4.1.2-1+cuda9.0) ...
Setting up libnvinfer-samples (4.1.2-1+cuda9.0) ...
Setting up tensorrt (4.0.1.6-1+cuda9.0) ...
Processing triggers for libc-bin (2.23-0ubuntu10) ...

安裝關於inference的幾個Python介面:

安裝如下圖: 

[email protected]:/media/vslyu/home/sinc-lab/Downloads$ sudo apt-get install python-libnvinfer-doc swig
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  python-libnvinfer python-libnvinfer-dev swig3.0
Suggested packages:
  swig-doc swig-examples swig3.0-examples swig3.0-doc
The following NEW packages will be installed:
  python-libnvinfer python-libnvinfer-dev python-libnvinfer-doc swig swig3.0
0 upgraded, 5 newly installed, 0 to remove and 26 not upgraded.
Need to get 1,001 kB/4,100 kB of archives.
After this operation, 17.2 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 file:/var/nv-tensorrt-repo-cuda9.0-ga-trt4.0.1.6-20180612  python-libnvinfer 4.1.2-1+cuda9.0 [1,036 kB]
Get:2 file:/var/nv-tensorrt-repo-cuda9.0-ga-trt4.0.1.6-20180612  python-libnvinfer-dev 4.1.2-1+cuda9.0 [1,122 B]
Get:3 file:/var/nv-tensorrt-repo-cuda9.0-ga-trt4.0.1.6-20180612  python-libnvinfer-doc 4.1.2-1+cuda9.0 [2,062 kB]
Get:4 http://mirrors.hust.edu.cn/ubuntu xenial/universe amd64 swig3.0 amd64 3.0.8-0ubuntu3 [995 kB]
Get:5 http://mirrors.hust.edu.cn/ubuntu xenial/universe amd64 swig amd64 3.0.8-0ubuntu3 [6,278 B]
Fetched 1,001 kB in 0s (4,652 kB/s)
Selecting previously unselected package python-libnvinfer.
(Reading database ... 191219 files and directories currently installed.)
Preparing to unpack .../python-libnvinfer_4.1.2-1+cuda9.0_amd64.deb ...
Unpacking python-libnvinfer (4.1.2-1+cuda9.0) ...
Selecting previously unselected package python-libnvinfer-dev.
Preparing to unpack .../python-libnvinfer-dev_4.1.2-1+cuda9.0_amd64.deb ...
Unpacking python-libnvinfer-dev (4.1.2-1+cuda9.0) ...
Selecting previously unselected package python-libnvinfer-doc.
Preparing to unpack .../python-libnvinfer-doc_4.1.2-1+cuda9.0_amd64.deb ...
Unpacking python-libnvinfer-doc (4.1.2-1+cuda9.0) ...
Selecting previously unselected package swig3.0.
Preparing to unpack .../swig3.0_3.0.8-0ubuntu3_amd64.deb ...
Unpacking swig3.0 (3.0.8-0ubuntu3) ...
Selecting previously unselected package swig.
Preparing to unpack .../swig_3.0.8-0ubuntu3_amd64.deb ...
Unpacking swig (3.0.8-0ubuntu3) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up python-libnvinfer (4.1.2-1+cuda9.0) ...
Setting up python-libnvinfer-dev (4.1.2-1+cuda9.0) ...
Setting up python-libnvinfer-doc (4.1.2-1+cuda9.0) ...
Setting up swig3.0 (3.0.8-0ubuntu3) ...
Setting up swig (3.0.8-0ubuntu3) ...

驗證TensorRT是否成功安裝:

法二(建議使用):例程測試:

$ cp -r /usr/src/tensorrt/ ~/LYH
$ cd ~/LYH/tensorrt/samples 
$ make -j16
$ cd ../bin
$ ./sample_int8 mnist

輸出成功的結果類似如下:

[email protected]:~/LYH/tensorrt/bin$ ./sample_int8 mnist

FP32 run:400 batches of size 100 starting at 100
........................................
Top1: 0.9904, Top5: 1
Processing 40000 images averaged 0.0019213 ms/image and 0.19213 ms/batch.

FP16 run:400 batches of size 100 starting at 100
Engine could not be created at this precision

INT8 run:400 batches of size 100 starting at 100
........................................
Top1: 0.9908, Top5: 1
Processing 40000 images averaged 0.00145806 ms/image and 0.145806 ms/batch.
[email protected]:~/LYH/tensorrt/bin$