1. 程式人生 > >Tensorflow執行環境的cuda+cudnn版本問題

Tensorflow執行環境的cuda+cudnn版本問題

問題

CentOS Linux release 7.3.1611伺服器上以前裝過tensorflow1.0,cuda8.0,cudnn v5.1,原本是能正常執行tf程式,一段時間沒用,出了點小問題,故查資料解決一下

I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcudnn.so.5
. LD_LIBRARY_PATH: /usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/lib: I tensorflow/stream_executor/cuda/cuda_dnn.cc:3517] Unable to load cuDNN DSO I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library
libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally ··· I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: name: Tesla K40m major: 3 minor: 5 memoryClockRate (GHz) 0.745 pciBusID 0000:82:00.0 Total memory: 11.17
GiB Free memory: 2.08GiB I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:82:00.0) F tensorflow/stream_executor/cuda/cuda_dnn.cc:222] Check failed: s.ok() could not find cudnnCreate in cudnn DSO; dlerror: /usr/local/python3/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: cudnnCreate Aborted (core dumped)

說 libcudnn.so.5 找不到,到 /usr/local/cuda-8.0/lib64 目錄下檢視
這裡寫圖片描述
確實沒有,而且cudnn以前升過級,現在系統裡裝了6和7兩個版本,沒有5怎麼辦呢,沒關係,建個軟連結就行 ln -s libcudnn.so.7 libcudnn.so.5
然而,

I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
···
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:82:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:390] Loaded runtime CuDNN library: 7004 (compatibility version 7000) but source was compiled with 5105 (compatibility version 5100).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 
Aborted (core dumped)

還是出錯,說cudnn7不相容,要求cudnn5.1。意思是版本太高了?我查了一些別人部落格,大部分都是說cudnn版本低不相容的;然後又到cuda官網查了一下cuda 8.0對應cudnn版本
這裡寫圖片描述
看來是不對,我直接換成了 ln -s libcudnn.so.6 libcudnn.so.5
這裡寫圖片描述
然後程式成功執行。

總結

兩個問題,cudnn庫不存在,和cudnn庫版本不對。
解決辦法雖然簡單,但要多注意,搞GPU計算環境時,系統版本、顯示卡計算能力、cuda版本、cudnn版本,這些東西的匹配問題。