Ubuntu 16.04下安裝Cuda 8.0, Anaconda 4.4.0和TensorFlow 1.2.1
阿新 • • 發佈:2019-01-10
原文連結:http://blog.csdn.net/jinzhuojun/article/details/77140806
- Cuda
sudo apt-get remove --purge nvdia*
sudo apt-get install update
sudo apt-get install dkms build-essential linux-headers-generic
sudo vim /etc/modprobe.d/blacklist.conf
在blacklist.conf中加上:
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off
重啟。如果進不了圖形介面,就把unity那坨都重灌一下,然後再通過sudo service lightdm start啟動桌面環境。sudo service lightdm stop sudo add-apt-repository ppa:graphics-drivers/ppa && sudo apt-get update sudo apt-get install nvidia-375
- Anaconda
bash ~/Downloads/Anaconda2-4.4.0-Linux-x86_64.sh
然後就可以建立環境,比如建立兩個分別為python 2.7和3.5的環境:conda create --name py35 python=3.5
conda create --name py27 python=2.7
其中py27和py35為環境名,之後用:
source activate <env name>
進入相應的環境。退出用:
source deactivate
列出當前環境資訊:
conda list
刪除環境可以用:
conda remove --name <env name> --all
列出現有的環境:
conda env list
列出環境中安裝的包:
conda list --name=<env name>
更多用法請參見:https://conda.io/docs/using/envs.html
進入環境後安裝包既可以用conda install也可以用傳統的pip install,有時網路不給力的時候可能下載會超時:
ReadTimeoutError: HTTPSConnectionPool(host='pypi.python.org', port=443): Read timed out.
如果真的只是因為慢,這裡可以用延長timeout時間來解決:
pip --default-timeout=10000 install -U <package name>
另外如果在使用過程中碰到下面錯誤:
ValueError: failed to parse CPython
有可能是和使用者目錄下的本地環境串了。一個方法是開啟anaconda2/lib/python2.7/site.py,修改ENABLE_USER_SITE = False。
- TensorFlow
source activate py27
pip install --ignore-installed --upgradehttps://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.1-cp35-cp35m-linux_x86_64.whl
再稍微驗證下能否順利載入:
python -c "import tensorflow as tf;print(tf.__version__);"
如果打印出剛裝的版本號那就差不多了。
但官方prebuild版沒有加入x86並行指令(SSE/AVX/FMA)優化。因此訓練的時候會列印類似下面資訊:
2017-08-12 20:10:39.973508: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-12 20:10:39.973536: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-12 20:10:39.973541: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-12 20:10:39.973545: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-12 20:10:39.973549: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
有個鴕鳥的辦法就是將log level提高,眼不見心不煩:
export TF_CPP_MIN_LOG_LEVEL=2
但這樣把其它一些log也過濾了。另一方面,x86的並行加速指令在一些情況下是可以帶來幾倍的效能提升的。因此我們可以考慮自己編譯一個帶該優化的版本。先下載原始碼,然後checkout相應版本分支(如r1.2):
git clone https://github.com/tensorflow/tensorflow
git checkout r1.2
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
如果你編譯的時候碰到以下錯誤:
Loading:
Loading: 0 packages loaded
ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': error loading package 'tensorflow/tools/pip_package': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
這是一個已知問題(https://github.com/tensorflow/tensorflow/pull/11949),解決方法見https://github.com/tensorflow/tensorflow/pull/11949/commits/c5d311eaf8cc6471643b5c43810a1feb19662d6c,目前貌似還沒有pick到釋出分支,人肉pick下吧,應該就解決了。編譯好後用下面命令在指定目錄(如~/tmp/)生成whl安裝包,然後就和前面一樣安裝即可。
bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tmp/
如果執行時出現下面錯誤:
ImportError: Traceback (most recent call last):
File "tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
ImportError: No module named pywrap_tensorflow_internal
根據https://stackoverflow.com/questions/35953210/error-running-basic-tensorflow-example,cd到非tensorflow原始碼目錄即可。