學習Caffe（一）Ubuntu 14.04 安裝Caffe+cuda+cudnn+pycaffe+matcaffe

阿新 • • 發佈：2019-01-09

Caffe是一個深度學習框架，本文講闡述如何在linux下安裝GPU加速的caffe。
系統配置是：

OS: Ubuntu14.04
CPU: i5-4690
GPU： GTX960
RAM: 8G

CUDA:推薦7.0以上的cuda和最新的顯示卡驅動。
BLAS:ATLAS, MKL, or OpenBLAS。C++矩陣運算庫。
Boost >= 1.55。用到一些數學函式等。
protobuf：是一種輕便、高效的結構化資料儲存格式，可以用於結構化資料序列化，很適合做資料儲存或 RPC 資料交換格式。

glog&&gflags：谷歌的一個日誌庫；命令列引數解析庫。方便除錯使用。
hdf5：
lmdb,leveldb:資料庫IO。準備資料時會用到。

可選依賴：

OpenCV >= 2.4 including 3.0
IO libraries: lmdb, leveldb (note: leveldb requires snappy)
cuDNN for GPU acceleration (v5)

Pycaffe:
Python 2.7 or Python 3.3+, numpy (>= 1.7), boost-provided boost.python

Matcaffe:
MATLAB with the mex compiler

安裝CUDA7.5

CUDA維基百科：https://zh.wikipedia.org/wiki/CUDA
CUDA（Compute Unified Device Architecture，統一計算架構）是由NVIDIA所推出的一種整合技術，是該公司對於GPGPU的正式名稱。通過這個技術，使用者可利用NVIDIA的GeForce 8以後的GPU和較新的Quadro GPU進行計算。亦是首次可以利用GPU作為C-編譯器的開發環境。

安裝過程

1.下載Cuda

下載CUDA:https://developer.nvidia.com/cuda-downloads

選擇下載deb包（或者runfile），下載完後用mu5sum檢查一下檔案是否完整。按照cuda官方文件安裝cuda.

2.安裝

先關閉桌面顯示管理器lightdm，進入字元介面，在字元介面安裝cuda。(這是因為cuda的安裝包裡包含了顯示卡驅動，安裝驅動前要先關閉桌面顯示管理器)
(也可分別安裝顯示卡驅動與cuda庫)

sudo service stop

切換到deb包目錄，執行下面的命令

sudo dpkg -i cuda-repo-<distro>_<version>_<architecture>.deb  
sudo apt-get update  
sudo apt-get install cuda

然後重啟電腦：sudo reboot
注意，cuda的安裝包中已經包含了較新版本的顯示卡驅動。

3.配置環境變數

將cuda安裝目錄下的bin路徑匯出到系統的搜尋路徑path
這裡寫圖片描述
並使之生效

新增動態庫查詢路徑：在 /etc/ld.so.conf.d/加入檔案 cuda.conf, 內容如下

/usr/local/cuda/lib64

儲存後，執行下列命令使之立刻生效:

sudo ldconfig

4.驗證

檢視Cuda的C編譯器NVCC的版本：

nvcc -V

這裡寫圖片描述

編譯並執行例子，進入cuda目錄下的samples目錄，然後在該目錄下make，等待十來分鐘。編譯完成後，可以在Samples裡面找到bin/x86_64/linux/release/目錄，並切換到該目錄
執行deviceQuery程式，檢視輸出結果如下（重點關注最後一行，Pass表示通過測試）。
這裡寫圖片描述

5.gcc編譯器版本

該版本cuda不支援gcc5.0的編譯器

安裝Cudnn

安裝BLAS

install ATLAS by sudo apt-get install libatlas-base-dev or install OpenBLAS or MKL for better CPU performance.

下載Caffe

安裝Caffe依賴庫

通用依賴庫：

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev

Ubuntu14.04 依賴庫：

sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev

PyCaffe依賴庫

進入caffe/python目錄，安裝依賴項：

for req in $(cat requirements.txt); do pip install $req; done

caffe官網推薦使用Anaconda http://continuum.io/downloads#all Anaconda是一個和Canopy類似的科學計算環境，但用起來更加方便。自帶的包管理器conda也很強大。

MatCaffe

安裝matlabR2014a

編譯caffe

複製並修改Makefile.config檔案：

## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!

# cuDNN acceleration switch (uncomment to build with cuDNN).
  USE_CUDNN := 1

# CPU-only switch (uncomment to build without GPU support).
# CPU_ONLY := 1

# uncomment to disable IO dependencies and corresponding data layers
# USE_OPENCV := 0
# USE_LEVELDB := 0
# USE_LMDB := 0

# uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary)
#   You should not set this flag if you will be reading LMDBs with any
#   possibility of simultaneous read and write
# ALLOW_LMDB_NOLOCK := 1

# Uncomment if you're using OpenCV 3
# OPENCV_VERSION := 3

# To customize your choice of compiler, uncomment and set the following.
# N.B. the default for Linux is g++ and the default for OSX is clang++
# CUSTOM_CXX := g++

# CUDA directory contains bin/ and lib/ directories that we need.
CUDA_DIR := /usr/local/cuda
# On Ubuntu 14.04, if cuda tools are installed via
# "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
# CUDA_DIR := /usr

# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
        -gencode arch=compute_20,code=sm_21 \
        -gencode arch=compute_30,code=sm_30 \
        -gencode arch=compute_35,code=sm_35 \
        -gencode arch=compute_50,code=sm_50 \
        -gencode arch=compute_50,code=compute_50

# BLAS choice:
# atlas for ATLAS (default)
# mkl for MKL
# open for OpenBlas
BLAS := atlas
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
# Leave commented to accept the defaults for your choice of BLAS
# (which should work)!
# BLAS_INCLUDE := /path/to/your/blas
# BLAS_LIB := /path/to/your/blas

# Homebrew puts openblas in a directory that is not on the standard search path
# BLAS_INCLUDE := $(shell brew --prefix openblas)/include
# BLAS_LIB := $(shell brew --prefix openblas)/lib

# This is required only if you will compile the matlab interface.
# MATLAB directory should contain the mex binary in /bin.
MATLAB_DIR := /usr/local/MATLAB/R2014a
# MATLAB_DIR := /Applications/MATLAB_R2012b.app

# NOTE: this is required only if you will compile the python interface.
# We need to be able to find Python.h and numpy/arrayobject.h.
 PYTHON_INCLUDE := /usr/include/python2.7 \
        /usr/local/lib/python2.7/dist-packages/numpy/core/include
# Anaconda Python distribution is quite popular. Include path:
# Verify anaconda location, sometimes it's in root.
# ANACONDA_HOME := $(HOME)/anaconda
# PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
#        $(ANACONDA_HOME)/include/python2.7 \
#        $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \

# Uncomment to use Python 3 (default is Python 2)
# PYTHON_LIBRARIES := boost_python3 python3.5m
# PYTHON_INCLUDE := /usr/include/python3.5m \
#                 /usr/lib/python3.5/dist-packages/numpy/core/include

# We need to be able to find libpythonX.X.so or .dylib.
 PYTHON_LIB := /usr/lib
# PYTHON_LIB := $(ANACONDA_HOME)/lib

# Homebrew installs numpy in a non standard path (keg only)
# PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
# PYTHON_LIB += $(shell brew --prefix numpy)/lib

# Uncomment to support layers written in Python (will link against Python libs)
 WITH_PYTHON_LAYER := 1

# Whatever else you find you need goes here.
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib

# If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
# INCLUDE_DIRS += $(shell brew --prefix)/include
# LIBRARY_DIRS += $(shell brew --prefix)/lib

# Uncomment to use `pkg-config` to specify OpenCV library paths.
# (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
# USE_PKG_CONFIG := 1

# N.B. both build and distribute dirs are cleared on `make clean`
BUILD_DIR := build
DISTRIBUTE_DIR := distribute

# Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
# DEBUG := 1

# The ID of the GPU that 'make runtest' will use to run unit tests.
TEST_GPUID := 0

# enable pretty build (comment to see full commands)
Q ?= @

進入caffe目錄，執行：

make all
make test
make runtest

無錯誤，編譯完成。

編譯pycaffe與matcaffe

進入caffe目錄，執行

make pycaffe
make matcaffe

Caffe python介面

複製caffe/python/caffe 到/usr/local/lib/python2.7/dist-packages/目錄下。
複製caffe/build/lib/下的庫檔案到/usr/local/lib

$ sudo ldconfig

開啟python，import caffe，無錯誤。
或者將路徑匯入：

import sys
sys.path.insert(0, caffe_dir)
import caffe

Caffe C++介面

分別將include,lib目錄複製。

Caffe Debug

Cmake編譯Caffe,可以用clion除錯。在Cmakelist.txt中設定編譯選項。

碰到的問題

庫錯誤

在利用cmake編譯caffe時，出現如下錯誤：

Linking CXX shared library ../../lib/libcaffe-d.so /usr/bin/ld: 
/usr/local/lib/libcblas.a(cblas_sgemv.o): relocation R_X86_64_32 
against `.rodata.str1.1’ can not be used when making a shared object; 
recompile with -fPIC /usr/local/lib/libcblas.a: error adding symbols: 
Bad value collect2: error: ld returned 1 exit status make[2]: * 
[lib/libcaffe-d.so.1.0.0-rc3] Error 1 make[1]: * 
[src/caffe/CMakeFiles/caffe.dir/all] Error 2 make: * [all] Error 2

解決方法：編輯cbuild資料夾下的CMakeCache.txt，將

//Path to a library. Atlas_CBLAS_LIBRARY:FILEPATH=path to
libcblas.a

改為

//Path to a library.
Atlas_CBLAS_LIBRARY:FILEPATH=path to
libcblas.so in your machine

這就應該是機器上利用不同方式多次裝過這個庫，檔案較為混亂，找不到正確的庫造成的。
Ubuntu14.04通過make+cmake編譯安裝caffe
進入cmake的build目錄，執行make即可。

庫衝突

系統的protobuf庫是2.6，而python的protobuf庫是3.3。
解決方法：更新系統protobuf庫。手動下載protobuf原始碼，編譯安裝。最後記得sudo ldconfig。
注意，如果使用anaconda，由於anaconda庫也有protobuf，注意別發生衝突。

測試

準備資料

cd $CAFFE_ROOT
./data/mnist/get_mnist.sh
./examples/mnist/create_mnist.sh

LeNet: the MNIST Classification Model

…

升級cuda8.0

安裝cuda8.0，重啟
編譯cuda samples無法執行，提示錯誤：
這裡寫圖片描述
應該是驅動版本沒有更新
檢視/etc/modprobe.d/目錄下的檔案，檢視nvidia-graphics-drivers.conf：
將alias nvidia nvidia_352改為alias nvidia_367（具體改成什麼，要看nvidia驅動生成的模組叫什麼名字。）
將alias nvidia-uvm nvidia_352-uvm改為alias nvidia-uvm nvidia_367-uvm
這裡寫圖片描述
問題解決。

系統升級了核心

重啟後系統自動升級了核心，此時nvidia驅動需要重灌。但是發現安裝不上驅動。經查發現是因為這個新的核心版本存在bug，在裝nvidia驅動的時候會報錯。
我選擇降低核心到之前的版本，具體方法需要修改grub，參見這篇文章http://blog.csdn.net/dl_chenbo/article/details/52400044

降低了核心之後，重新安裝好了驅動384和對應的cuda 9。執行deviceQuary時依然出現錯誤：

cudaGetDeviceCount returned 30

這個錯誤一般是因為nvidia驅動沒有安裝成功，沒有載入到核心中去。於是，我查詢當前載入的nvidia核心有哪些：

lsmod | grep nvidia

這裡寫圖片描述
發現缺少了nvidia_uvm模組。
於是嘗試手動載入此模組，命令是

modprobe nvidia-uvm

結果報錯，說無法插入nvidia_384_uvm。執行

sudo updatedb 更新資料庫
locate --regex nvidia.*uvm.ko

發現核心模組是nvidia-uvm.ko.
這個問題應該是我在某個地方對nvidia-uvm取了別名nvidia_384_uvm,導致載入的時候是nvidia_384_uvm，而實際的核心是nvidia-uvm.ko。檢視/etc/modprob.d/nvidia-graphics-drivers.conf,果然發現

這裡寫圖片描述
註釋掉之後。再次載入核心

sudo modprobe nvidia-uvm

成功！

run安裝

分別下載驅動和cuda再安裝。